
per 


WORLD INTELLECTUAL PROPERTY ORGANIZATION 

Internationa! Bureau 



INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCI) 


(51) International Patent Classification 4 : 

C12P 21/00, O07K 15/00 
C12N 15/00 



(11) International Publication Number: 
(43) International Publication Date: 


WO 90/02809 I 

22 March 1990 (22.03.90) | 


(21) International Application Number: 


PCT/US89/03731 


(22) International Filing Date: 


1 September 1989 (01.09.89) 


(30) Priority data: 
240,160 


2 September 1988 (02.09.88) US 


(71) Applicant: PROTEIN ENGINEERING CORPORA- 

TION [US/US]; 765 Concord Avenue, Cambridge, MA 
02138 (US). 

(72) Inventors: LADNER, Robert, Charles ; 3827 Green Valley 

Road, Ijamsville, MD 21754 (US). GUTERMAN, Sonia, 
K. ; 20 Oakley Road, Belmont, MA 02178 (US). 

(74) Agent: COOPER, Iver, P.; Browdy and Neimark, 419 Sev- 
enth St., N.W., Suite 300, Washington, DC 20004 (US). 


(81) Designated States: AT, AT (European patent), AU, BB, BE | 
(European patent), BF (OAPI patent), BG, BJ (OAPI j 
patent), BR, CF (OAPI patent), CG (OAPI patent), CH, | 
CH (European patent), CM (OAPI patent), DE, DE ! 
(European patent), DK, FI, FR (European patent), GA j 
(OAPI patent), GB, GB (European patent), HU, IT (Eu- 1 
ropean patent), JP, KP, KR, LK, LU, LU (European pa- 
tent), MC, MG, ML (OAPI patent), MR (OAPI patent), 
MW, NL, NL (European patent), NO, RO, SD, SE, SE 
(European patent), SN (OAPI patent), SU, TD (OAPI 
patent), TG (OAPI patent). 


Published 

With international search report . 


(54) Title: GENERATION AND SELECTION OF RECOMBINANT VARIED BINDING PROTEINS 
(57) Abstract 

In order to obtain a novel binding protein against a chosen target, DNA molecules, each encoding a protein comprising 
one of a family of similar potential binding domains and a structural signal calling for the display of the protein on the outer sur- 
face of a chosen bacterial cell, bacterial spore or phage (genetic package) are introduced into a genetic package. The protein is ex- 
pressed and the potential binding domain is displayed on the outer surface of the package. The cells or viruses bearing the bind- 
ing domains which recognize the target molecule are isolated and amplified. The successful binding domains are then character- 
ized. One or more of these successful binding domains is used as a model for the design of a new family of potential binding 
domains, and the process is repeated until a novel binding domain having a desired affinity for the target molecule is obtained. In 
one embodiment, the first family of potential binding domains is related to bovine pancreatic trypsin inhibitor, the genetic pack- 
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GENERATION AND SELECTION OF RECOMBINANT 
VARIEGATED BINDING PROTEINS 


5 Field of the Invention 

This invention relates to development of novel 

binding proteins by an iterative process of 
mutagenesis, expression, chromatographic selection, and 

amplification. 

10 

T reformat ion Disclosure S tatement 

The amino acid sequence of a protein determines 
its three-dimensional (3D) structure, which in turn 
15 determines protein functioning (EPST63_, — ANFT73 .) . The 

system of classification of protein structure of Schulz 
and Schirmer (SCHU79. , ch 5) is adopted herein. 


The 3D structure of a protein is essentially 
20 unaffected by the identity' of the amino acids at some 
loci? at other loci only one or a few types of amino 
acid is allowed f SHORSS, EXSE8S , REID88) . Generally , 
loci where wide variety is allowed have the amino acid 
group directed toward the solvent. While limited 
25 variety is allowed where the side group is directed 

toward other parts of the protein. (See also SCHU79 , 
pi 69— 171 and CREI84 , p239-245, 314-315) . 

The secondary structure (helices, sheets, turns, 
3 0 loops) of a protein is determined mostly by local 

sequence. Certain amino acids tend to be correlated 

with certain secondary structures and the commonly used 
Chou-Fasman r cHQU74 . cwoTT78a. CHOQ78b ) rules depend on 
these correlations. However, every amino acid type has 
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been observed in helices and in both parallel and 
antiparallel sheets. Pentapeptides of identical 
sequence are found in different proteins; in some cases 
the conformations of the pentapeptides are very 
5 different / K&BS84 . ARGOS7 1 . 

Turns and loops tolerate insertions and deletions 
more readily than do other secondary structures 
(RICH81, THOR88, SUTC87a) ; related proteins differ most 
10 in loops and turns. 

Changing three residues in subtilisin from 
Bacillus amvloliauefaciens to be the same as the 
corresponding residues in subtilisin from B . 
15 licheniformis produced a protease that had nearly the 
same activity as the subtilisin from the latter 
organism; 82 differences remained in the sequences. 
The three residues changed were chosen because they 
were the only differences within 7 Angstroms (A) of the 
20 active site f WEI/L87a '> . 

Schulz and Schirmer summarize many observations on 
the binding of proteins to other molecules ( SCHP79 . 
p9 8-105) . For example, haemoglobin alpha chains bind 
25 very tightly to haemoglobin beta chains (delta G more 
negative than -11 . 0 Kcal/mole) ; antibodies bind tightly 
to antigens (K^s range from 10" 6 . to 10” 14 M, is the 
dissociation constant equal to [A] [B]/[A:B] ) ; basic 
bovine pancreatic trypsin inhibitor (BPTX) binds 
30 tightly to trypsin (K^ = 6.0 x 10“ 14 M / TSCH87 1 . delta 
G = -18.0 Kcal/mole) ; and avidin binds to biotin (K^ = 
1.3 x 10-15 M (CREI84, p3 62) ) . In each case the 
binding results from complementarity of the surfaces 
that come into contact: bumps fit into holes, unlike 
35 charges come together, dipoles align, and hydrophobic 
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atoms contact other hydrophobic atoms . Although bulk 
water is excluded, individual water molecules are 
frequently found filling space in intermolecular 
interfaces ; these waters usually form hydrogen bonds to 
5 one or more atoms of the protein or to other bound 
water . 

The factors affecting protein binding are known, 

( CHOT7 5 . CHOT76. SCHP79 , p98-107, and CREI84 . Ch8) , but 
10 designing new complementary surfaces has proved 
difficult. Although some rules have been developed for 
substituting side groups (SUTC87b) , the side groups of 
proteins are floppy and it is difficult to predict what 
conformation a new side group will take. Further, the 
15 forces that bind proteins to other molecules are all 
relatively weak and it is difficult to predict the 
effects of these forces. Hence, it is difficult to 
design superior binding proteins based on theory alone 
(QUI087) . 

20 

En zyme — substrate affinity, however, has 
fortuitously been increased by protein engineering 
(WHjK 84) . A point mutant of tyrosyl tRNA synthetase of 
Bacillus stearothermoohilus exhibits a 100-fold 
25 increase in affinity for ATP. Substitution of one 
amino acid for another at a surface locus may 
profoundly alter binding properties of the protein 
other than substrate binding, without affecting the 
tertiary structure of the protein. For example, in 
30 sickle-cell haemoglobin the change of the surface 
residue E6 to V in the beta chains causes 
deoxyhaemoglobin— S' to form fibers through self binding 
( DICK83 . pi25— 145 ) ; the tertiary and quaternary, 

structure of the haemoglobin are not changed ( PADL8 5 , 
35 WISH75. WTSH76 ^ . 
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Changing a single amino acid in BPTI greatly 
reduces its binding to trypsin, but some of the new 
molecules retain the parental characteristics of 
5 binding to and inhibiting chymotrypsin, while others 

exhibit new binding to elastase f TANK77 ; TSCH87 ^ . 
Changes of single amino acids on the surface of the 
lambda Cro repressor greatly reduce its affinity for 
the natural operator Or3 , but greatly increase the ^ 

10 binding of the mutant protein to a mutant operator 

f ETSE85 1 . Thus changing the surface of a binding 
protein may alter its specificity without abolishing 
binding activity. 

15 The recently developed techniques of "reverse 

genetics" have been used to produce single • specific 
mutations at precise base pair loci ( OUP86 . 0LTP87 , 
and AUSU87 } . Mutations axe generally detected by 
sequencing and in some cases by loss of wild-type 
•20 function. These procedures allow researchers to 

analyze the function of each residue in a protein 
/ mtt.t. 88 \ or of each base pair in a regulatory DNA 

sequence / CHEN88 ) . in these analyses, the norm has 
been to strive for the classical goal of obtaining 
25 mutants carrying a single alteration (AUSU87) . 

Reverse genetics is often applied to coding 
regions to determine which residues are most important 
to protein structure and function; isolation of a 

3 0 single mutant at each residue of the protein gives an 
initial estimate of which residues play crucial roles. 

Prior to the method of the present invention, two 
general approaches have been developed to create novel 
35 mutant proteins through reverse genetics. In one 
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approach, dubbed "protein surgery" (DILL87) , a specific 
substitution is introduced at a single protein residue 
to determine the effects on structure and function of 
specific substitutions f CRAiss) fRAOS 87 i rBASHR 7 ^ . 

5 However, many desirable protein alterations require 
multiple amino acid substitutions and thus are not 
accessible through single base changes or even through 
all possible amino acid substitutions at any one 
residue. 

10 

The other approach has been randomly to generate a 
variety of mutants at many loci within a cloned gene 
using mutagenic chemicals or radiation. The specific 
location and nature of the change are determined by DNA 

15 sequencing. ( PAKU86 ) This approach is limited by the 
number of colonies that can be examined. Also, it does 
not take advantage of any knowledge of the protein 
structure and its relationship to binding activity. 

20 Progress toward rules governing substitutions of 

amino acids (ULME83) has been greatly hampered by the 
extensive efforts involved in using either method and 
the practical limitations on the number of colonies 
that can be inspected (ROBE86) . 

25 

The term "saturation mutagenesis" with reference 
to synthetic DNA is generally taken to mean generation 
of a population in which: a) every possible single-base 
change within a fragment of a gene of DNA regulatory 

30 region is represented, and b) most mutant genes contain 
only one mutation. Thus a set of all possible single 
mutations for a 6 base pair length of DNA comprises a 
population of 18 mutants. Oliphant et al. (OLIP86) and 
Oliphant and Struhl (0LIP87) have demonstrated ligation 
and cloning of highly degenerate oligonucleotides and 


35 
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have applied saturation mutagenesis to the study of 
promoter sequence and function. They suggest that 
similar methods could be used to study genetic 
expression of proteins, but they do not say how to: a) 

5 choose protein residues to vary, or b) select or screen 
mutants with desirable properties. 

Reidhaar— Olson and Sauer (REID88) have used 
synthetic degenerate oligo-nts to vary simultaneously 
10 two or three residues through all twenty amino acids in 
the dimer interface of cl repressor from bacteriophage 
lambda. They give no discussion of the limits on how 
many residues could be varied at once nor do they 
mention the problem of unequal abundance of DNA 
15 encoding different amino acids. They looked for 
proteins that either had wild- type dimerization or that 
did not dimerize. They did not seek proteins having 
novel binding properties and did not report any. 

20 Several researchers have designed and synthesized 

proteins de novo . These designed proteins are small 
and most have been synthesized ifl vitro as polypeptides 
rather than genetically. Gutte and colleagues have 
made a polypeptide that binds DDT in 55% ethanol 
25 (MOSE83) . Recently Moser et al. (MOSE87) reported 
genetic expression in Ej. coli both of the designed 24 
residue DDT-binding protein and of fusions of the DDT- 
binding sequence to LacZ . They state that design of 
biologically active proteins is currently impossible. 

30 

Erickson et al . (ERIC86) have designed and 
synthesized a series of proteins that they have named 
betabellins, that are meant to have beta sheets. They 
suggest use of polypeptide synthesis with mixed 
35 reagents to produce several hundred analogous 
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betabellins, and use of a column to recover analogues 
with high affinity for a chosen target compound bound 
to the column . They envision successive rounds of 

mixed synthesis of variant proteins and purification by 
5 specific binding. They do not discuss how residues 

should be chosen for variation. Because proteins 
cannot be amplified, the researchers must sequence the 
recovered protein to learn which substitutions improve 
binding. The researchers must limit the level of 
10 diversity so that each variety of protein will be 

present in sufficient quantity for the isolated 
fraction to be sequenced. 

Methods have been developed to separate cells 
15 through their affinity to various substances. Methods 
applied to animal cells reveal common problems: a) non- 
specific interactions between cells and affinity 
supports, and b) irreversible binding of cells to 
affinity matrices (BONN85) . 

20 ; 

Ferenci and collaborators have published a series 
of papers on the chromatographic isolation of mutants 
of the maltose-transport protein LamB of EL*. coli 
(WAND79, FERE80a, FERE 8 0b, FERE80C, FERE82a, FERE82b, 
25 FERE83, CLUN84, FERE86a, FERE8 6b , FERE86C, FERE87a, 
FERE87b, HEIN87, and HEIN88) . The papers report that 
spontaneous and induced mutants at the lamB genetic 
locus can be isolated by chromatography over a column 
supporting immobilized maltose, maltodextrins , or 
30 starch. The reports speculate that other applications 
are possible, but specifically mention only the 
elucidation of the residues responsible for the 
selectivity of the maltodextrin pore or similar pore 
proteins. The mutant proteins were non— chimeric, and 
no attempt was made to obtain binding to a new target. 


35 
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Both FERE8 6a and CLUN84 point up the 

difficulties of working with live bacteria that can 

* 

metabolize chemicals and change their physiological 
5 behavior during the chromatographic experiment. 

A fragment of a heterologous gene can be 
introduced into bacteriophage FI gene ITT (SMIT85) . If 
the inserted gene preserves the original reading frame , 
10 expression of the altered gene III causes an inserted 
domain to appear in the gene III protein. The 

resulting strain of fl virions are adsorbed by an 
antibody against the protein encoded by the 
heterologous DNA. The phage were eluted at pH 2.2 and 
15 retained some infectivity. However, the single copy of 
fl gene III was used for insertion of the heterologous 
gene so that all copies of gene III protein were 
affected; infectivity of the resultant phage was 
reduced 25-fold. 

20 

Smith presented his method as a way to isolate 
cloned genes using antibodies to the gene products. He 
made no mention of mutagenizing the inserted genetic 
material or of inducing novel binding properties in the 
25 inserted protein domain. 

A fragment of the repeat region of the 
circumsporozoite protein from Plasmodium falciparum has 
been expressed on the surface of M13 as an insert in 
30 the gene III protein (CRUZ88) . The recombinant phage 
were both antigenic and immunogenic in rabbits. The 
authors do not suggest mutagenesis of the inserted 
material . 
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Gene fragments coding for hepatitis B virus 
antigens have been fused to fragments of lamB . and if 
the fusion is in a region coding for exposed domains of 
LamB, the HBV antigens appear on the cell surface and 
5 are immunogenic (CHARS 7) . Charbit et al . (CHARS 7) 

suggest use of these engineered strains for development 
of a live bacterial vaccine; they did not suggest 
mutagenesis of the fused heterologous gene fragments, 
nor development of binding capabilities. 

10 

Ladner, US Patent No. 4,704,692, ’’Computer Based 
System and Method for Determining and Displaying 
Possible Chemical Structures for Converting Double- or 
Mul t ip 1 e - Cha in Polypeptides to Single-Chain 
15 Polypeptides'* describes a design method for converting 
proteins composed of two or more chains into proteins 
of fewer polypeptide chains, but with essentially the 
same 3D structure. There is no mention of variegated 
DNA and no genetic selection. Ladner and Bird, 

20 WO88/01649 (Publ. March 10, 1988) disclose the specific 

application of computerized design of linker peptides 
to the preparation of single chain antibodies. 

Ladner, Glick and Bird, WO88/06630 (publ. 7 Sept. 
25 1988) (LGB) speculate that diverse single chain 

antibody domains may be screened for binding to a 

particular antigen by varying the DNA encoding the 
combining determining regions of a single chain 

antibody, subcloning the SCAD gene into the gpV gene of 
30 phage lambda so that a SCAD/gpV chimera is displayed on 
the outer surface of the phage, and selecting phage 
which bind to the antigen ~ through affinity 
chromatography. The only antigen mentioned is bovine 
growth hormone. No other binding molecules, targets, 
35 carrier organisms, or outer surface proteins are 
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discussed. Nor is there any mention of the method or 
degree of mutagenesis. 

Ladner and Bird, W088/06601 (publ. 7 September 
5 1988) suggest that single chain "psuedodimeric" 

repressors (DNA-binding proteins) may be prepared by 
mutating a putative linker peptide followed by in vivo 
selection that mutation and selection may be used to 
create a dictionary of recognition elements for use in 
10 the design of asymmetric repressors. The repressors 
are not displayed on the outer surface of an organism. 

No admission is made that any cited reference is 
prior art or pertinent prior art, and the dates given 
15 are those appearing on the reference and may not be 
identical to the actual publication date. 

SUMMARY OF THE INVENTION 

20 This invention relates to the construction, 

expression, and selection of mutated genes that specify 
novel proteins with desirable binding properties, as 
well as these proteins themselves. The substances 

bound by these proteins, hereinafter referred to as 
25 "targets”, may be, but need not be, proteins. Targets 
may include other biological or synthetic 

macromolecules as well as organic and inorganic 
molecules . 

30 The novel binding proteins may be obtained: l) by 

mutating a gene encoding a known binding protein within 
the subsequence encoding a known binding domain, or 2) 
by taking such a subsequence of the gene for a first 
protein and combining it with all or part of a gene for 
35 a second protein (which may or may not be itself a 
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known binding protein) , 3) by mutating a gene encoding 
a protein which, while not possessing a known binding 
activity, possesses a secondary or higher structure 
that lends itself to binding activity (clefts, grooves, 
5 etc. ) , or 4) by mutating a gene encoding a known 
binding protein but not in the subsequence known to 
cause the binding. The protein from which the novel 
binding protein is derived need not have any specific 
affinity for the target material . 

10 

In one embodiment, the invention relates to: 

a) preparing a variegated population of replicable 
genetic packages, each package including a nucleic 

15 acid construct coding on expression for an outer- 

surf ace— displayed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 

20 binding said target, where a plurality of 

different potential binding domains are displayed 
by the individual packages, 

b) causing the expression of said protein and the 

25 display of said protein on the outer surface of 

such packages, 

c) contacting the packages with target material so 
that the potential binding domains of the proteins 

30 and the target material may interact, and 

separating packages bearing a potential binding 
domain that succeeds in binding the target 
material from packages that do not so bind. 
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d) recovering and replicating at least one package 
bearing a successful binding domain, 

(e) determining the amino acid sequence of the 

5 successful binding domain of a genetic package 

which bound to the target material, 

(f) preparing a new variegated population of 
replicable genetic packages according to step (a) , 

10 the parental potential binding domain for the 

potential binding domains of said new packages 
being a successful binding domain whose sequence 
was determined in step (e) , and repeating steps 
(b)-(e) with said new population, and, when a 
15 package bearing a binding domain of desired 

binding characteristics is obtained, 

(g) abstracting the gene encoding the desired 
binding domain from the genetic package and 

20 placing it into a suitable expression system. 

(The binding domain may then be expressed as a 
unitary protein, or as a domain of a larger 
protein) . 

25 The invention further relates to a method of 

preparing a mixed population of replicable genetic 
packages in which each package includes a gene 
expressing a potential binding protein in such a manner 
that the protein is presented on the outer surface of 
30 the package. This method comprises: 

i) preparing a variegated population of DNA 
inserts of each of which comprises a first 
sequence which codes on expression for a potential 
35 binding domain and, a second sequence encoding 
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signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
replicable genetic package, and 

5 ii) incorporating, the resulting population of DNA 

constructs into the chosen replicable genetic 
packages to produce a population of replicable 
genetic packages . 

10 In a preferred embodiment, the potential -binding- 

protein-encoding inserts are incorporated into a gene 
encoding an outer-surface protein of the replicable 
genetic package. 

15 The invention encompasses the design and synthesis 

of variegated DNA encoding a family of potential 

binding proteins characterized by constant and variable 

regions, said proteins being designed with a view 

toward obtaining a protein that binds a predetermined 
* 

2 0 target . 

For the purposes of this invention, the term 
“potential binding protein" refers to a protein encoded 
by one species of DNA molecule in a population of 

25 variegated DNA wherein the region of variation appears 
in one or more subsequences encoding one or more 
segments of the polypeptide having the potential of 
serving as a binding domain for the target substance. 

30 From time to time, it may be helpful to speak of 

the "parent sequence" of the variegated DNA. When the 
novel binding domain sought is an analogue of a known 
binding domain, the parent sequence is the sequence 
that encodes the known binding domain. The variegated 

35 DNA will be identical with this parent sequence at most 
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loci, but will diverge from it at chosen loci. When a 
potential binding domain is designed from first 
principles, the parent sequence is a sequence which 
encodes the amino acid sequence that has been predicted 
5 to form the desired binding domain, and the variegated 
DNA is a population of "daughter DNAs" that are related 
to that parent by a high degree of sequence similarity. 

The fundamental principle of the invention is one 
10 of forced evolution . The efficiency of the forced 

evolution is greatly enhanced by careful choice of 
which residues are to be varied. The 3D structure of 
the potential binding domain is a key determinant in 
this choice. First a set of residues that can 

15 simultaneously contact one molecule of the target is 

identified. Then all or some of the codons encoding 
these residues are varied simultaneously to produce a 
variegated population of DNA. The variegated 
population of DNA is used to transform cells so that a 
20 variegated population of genetic packages is produced. 

The mixed population of genetic packages 
containing genes encoding possible binding proteins is 
enriched for packages containing genes that express 
25 proteins that in fact bind to the target ("successful 
binding domains") . After one or more rounds of such 
enrichment, one or more of the chosen genes are 
examined and sequenced. If desired, new loci of 

variation are chosen. The selected daughter genes of 
30 one generation then become the parent sequences for the 
next generation of variegated DNA, beginning the next 
"variegation cycle." Such cycles are continued until a 
protein with the desired target affinity is obtained. 
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The appended claims are hereby incorporated by 
reference Into this specification as an enumeration of 
the preferred embodiments. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic showing the relationships 
between various types of Binding Domains (BD) . 

10 Figure 2 is a flow chart showing the major steps used 
to create a novel protein with affinity for a pre- 
determined target. 

Figure 3 is a schematic of a PBD contacting a molecule 
15 of target material. 

Figure 4 is a schematic of the construction of pLG3 
from M13mpl8 and pBR322 . 

20 Figure 5 is a schematic of the construction of pLG7 
from pLG3 and synthetic DNA. 

DETAILED DESCRIPTION OF THE INVENTION 

25 Sec. 0.1: Overview: 

The present invention separates mutated genes that 
specify novel proteins with desirable binding 
properties from closely related genes that specify 
30 proteins with no or undesirable binding properties, by: 
1) arranging that the product of each mutated gene be 
displayed on the outer surface of a replicable genetic 
package that contains the gene, and 2) using affinity 
separation incorporating a desirable target material to 
35 enrich the population of packages for those packages 
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containing genes specifying proteins with improved 
binding to that target material. 

Let Kq (x,y) be a dissociation constant, 

5 

[x] [y] 

K D (x,y) . 

[x:y] 

10 For the purposes of the appended claims, a protein 

P is a binding protein if 

(1) for one molecular, ionic or atomic species A, 

the dissociation constant (P,A) 

15 < 10” 6 moles/liter, and 

(2) for a different molecular, ionic or 

atomic species B, Kq (P,B) > 10 -1 

moles/liter. 

20 

As a result of these two conditions, the protein P 
exhibits specificity for A over B, and a minimum degree 
of affinity (or avidity) for A. 

25 When a domain of a protein is primarily 

responsible for the protein's ability to specifically 
bind a chosen target, it is referred to herein as a 
"binding domain" (BD) . We engineer the appearance of a 
stable protein domain, denoted as an "initial potential 

30 binding domain" (XPBD) , on the surface of a genetic 
package. The present invention is concerned with the 
expression of numerous, diverse, variant "potential 
binding domains" (PBD) , all related to a "parental 
potential binding domain" (PPBD) such as the binding 

35 domain of a known binding protein, and with selection 
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and amplification of the genes encoding the most 
successful mutant PBDs. An IPBD is chosen as PPBD to 
the first round of variegation. Selection— through- 

binding isolates one or more "successful binding 
5 domains" (SBD) . An SBD from one round of variegation 
and selection-through-binding is chosen to be the PPBD 
for the next round. The invention is not, however, 
limited to proteins with a single BD since the method 
may be applied to any or all of the BDs of the protein, 
10 sequentially or simultaneously. The relationships of 
the various BDs are illustrated in Figure 1. 

The term "variegated DNA" refers to a population 
of molecules that have the same base sequence through 
15 most of their length, but that vary at a limited number 
of defined loci, preferably 5-10 codons. A molecule of 
variegated DNA can be introduced into a plasmid so that 
it constitutes part of a gene (OLIP86, OLIP87, AUSU87 , 
REID88) . When plasmids containing variegated DNA are 
20 used to transform bacteria, each cell makes a version 
of the original protein. Each colony of bacteria may 
produce a different version from any other colony. If 
the variegations of the DNA are concentrated at loci 
known to be on the surface of the protein or in a loop, 
25 a population of proteins will be generated, many 
members of which will fold into roughly the same 3D 
structure as the parent protein. The specific binding 
properties of each member, however, may be different 
from each other member. It remains to sort out the 
30 colonies containing genes for proteins with desirable 
binding properties from those that do not exhibit the 
desired affinities. 

A "single-chain antibody" is a single chain 
polypeptide comprising at least 200 amino acids, said 


35 
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amino acids forming two antigen— binding regions 
connected by a peptide linker that allows the two 
regions to fold together to bind the antigen. Either 
the two antigen-binding regions must be variable 
5 domains of known antibodies, or they must (l) each fold 
into a beta barrel of nine strands that are spatially 
related in the same way as are the nine strands of 
known antibody variable light or heavy domains, and (2) 
fit together in the same way as do the variable domains 
10 of said known antibody. Generally speaking, this will 
require that, with the exception of the amino acids 
corresponding to the hypervariable region, there is at 
least 88% homology with the amino acids of the variable 
domain of a known antibody. 

15 

The term "affinity separation means" includes, but 
is not limited to: a) affinity column chromatography, 
b) batch elution from an affinity matrix material, c) 
batch elution from an affinity material attached to a 
20 plate, d) fluorescence activated cell sorting, and e) 
electrophoresis in the presence of target material. 
"Affinity material" is used to mean a material with 
affinity for the material to be purified, called the 
"analyte". In most cases, the association of the 
25 affinity material and the analyte is reversible so that 
the analyte can be freed from the affinity material 
once the impurities are washed away. 

Affinity column chromatography, batch elution from 
30 an affinity matrix material held in some container, and 
batch elution from a plate are very similar and 
hereinafter will be treated under "affinity 
chromatography . " 



WO 90/02809 


PCT/US89/03731 


19 

Fluorescent-activated cell sorting involves use of 
an affinity material that is fluorescent per se or is 
labeled with a fluorescent molecule. Current 

commercially available cell sorters require 800 to 1000 
5 molecules of fluorescent dye, such as Texas red, bound 
to each cell. FACS can sort 10 3 cells or viruses/sec. 

Electrophoretic affinity separation involves 
electrophoresis of viruses or cells in the presence of 
10 target material, wherein the binding of said target 
material changes the net charge of the virus particles 
or cells. It has been used to separate bacteriophages 
on the basis of charge. (SERW87) . 

15 The present invention makes use of affinity 

separation of bacterial cells, or bacterial viruses (or 
other genetic packages) to enrich a population for 
those cells or viruses . carrying genes that code for 
proteins with desirable binding properties. 

20 

In the present invention, the words "select" and 
••selection" are used exclusively in the genetic sense; 
i . e. a biological process whereby a phenotypic 
characteristic is used to enrich a population for those 
25 organisms displaying the desired phenotype. 

The process of the present invention comprises 
three major parts: 

30 I. design and production of a replicable 

genetic package (GP) that displays an IPBD on 

the surface of the GP, denoted GP(IPBD) , 

II. design and implementation of an affinity 
35 separation process that separates GP(IPBD)s 
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that bind to a known affinity molecule from 
wild-type GPs or GP(IPBD“) s, neither of which 
binds the known affinity molecule, and 

5 III. design and implementation of a genetic 

variegation method, denoted structure- 
directed mutagenesis, wherein a population of 
10 6 or more different GP(PBD)s, denoted 
GP(vgPBD) , is produced. 

10 

one affinity separation is called a "separation cycle" ? 
one pass of variegation followed by as many separation 
cycles as are needed to isolate an SBD, is called a 
"variegation cycle" . The amino acid sequence of one 

15 SBD from one round becomes the PPBD to the next 
variegation cycle. We perform variegation cycles 
iteratively until the desired affinity and specificity 
of binding between an SBD and chosen target are 
achieved . 

20 

Part I is a strain construction in which we deal 
with a single IPBD sequence. Variability may be 
introduced into DNA subsequences ad j acent to the ipbd 
subsequence and within the o so— ipbd gene so that the 

25 IPBD will appear on the GP surface. A molecule, such 
as an antibody, having high affinity for correctly 
folded IPBD is used to: a) detect IPBD on the GP 

surface, b) screen colonies for display of IPBD on the 
GP surface, or c) select GPs that display IPBD from a 

30 population, some members of which might display IPBD on 
the GP surface. In one preferred embodiment. Part I of 
the process involves: 

1) choosing a GP such as a bacterial cell (Sec. 

35 1.1.1), bacterial spore (1.2.1), or phage (1.3.1), 
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having a suitable outer surface protein (Secs. 
1.1.3, 1.2.3, and 1.3.3) , 

2) choosing a stable IPBD (Sec. 2) , 

3) designing an amino acid sequence that: a) 

includes the IPBD as a subsequence and b) will 
cause the IPBD to appear on the GP surface (Secs. 
1.1.2, 1.2.2, 1.3.2, and 4 ) , 

4) engineering a gene, denoted osp-ipbd . that: a) 
codes for the designed animo acid sequence, b) 
provides the necessary genetic regulation, and c) 
introduces convenient sites for genetic 
manipulation (Secs. 4.1, 4.2, 4.3, 5.1, and 5.2), 

5) cloning the oso-lobd gene into the GP (Sec. 
6.1), and 


20 6) harvesting the transformed GPs (Sec. 7) and 

testing them for presence of IPBD on the GP 
surface (Sec. 8) ; this test is performed with an 
affinity molecule having high affinity for IPBD, 
denoted AfM ( IPBD) . 

25 

In another preferred embodiment. Part I of the process 
involves : 


1) and 2) as above 
30 

3) designing a DNA sequence that: a) encodes the 

IPBD as a subsequence and b) contains suitable 
restriction sites so that random DNA may be 
operably linked to the jpbd gene fragment; and c) 
35 provides the necessary genetic regulations; this 
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DNA sequence is called a "display probe", (Secs. 
1.1.4, 1.2.4, 1.3.4 and 4), 

4) constructing that display probe, 

5 

5) cloning the display probe into and amplifying 
it in a suitable host into the OCV, 

6) cloning random or pseudorandom DNA into one of 

10 the restriction sites provided in the display 

probe, (Sec. 6.2), whereby the random or 
pseudorandom DNA functions as a potential osp , and 

7) harvesting GPs (Sec. 7) screening colonies of 

15 the transformed GPs for presence of XPBD on the GP 

surface; this screening is performed with an 
affinity molecule having high affinity for IPBD, 
denoted AfM(IPBD) , (Sec. 8) ; or, alternatively; 

20 8) selecting GPs that display IPBD by use of an 

affinity separation using AfM(IPBD) , (Sec. 8) . 

Once a GP(IPBD) is produced, it can be used many 
times as the starting point for developing different 
25 novel proteins that bind to a variety of different 
targets. The knowledge of how we engineer the 
appearance of one IPBD on the surface of a GP can be 
used to design and produce other GP (IPBD) s that display 
different IPBDs. 

30 

Although Part I deals with only a single IPBD, 
many preparations are made for Part III where we 
introduce numerous mutations into the potential binding 
domain. References to PBD or obd in Part I are to 
35 indicate a preparatory intent. 
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In Part II we optimize separation of GP(IPBD) from 
wild-type GP, denoted wtGP, based on the affinity of 
IPBD for AfM(IPBD) and establish the sensitivity of the 
5 affinity separation process. In a preferred 

embodiment, Part II of the process of the present 
invention involves: 

1) preparing affinity columns bearing AfM (IPBD) at 

10 various densities of AfM ( IPBD) / (volume of matrix), 

(Sec. 10.1) , 

2) preparing GP(IPBD)s with various amounts of 
IPBD per GP, 

15 

3) picking a gradient regime for eluting the 
columns (Sec. 10.1), 

4) determining which combination of: a) IPBD/GP, 

20 b) density of AfM ( IPBD) / (volume of support), c) 

initial ionic strength, d) elution rate, and e) 
(amount of GP) / (volume of support) loaded, gives 
the best separation of GP(IPBD) from wtGP (Sec. 
10 . 1 ), 

25 

5) determining the smallest amount of GP(IPBD) 
that can be isolated from a much larger amount of 
wtGP using the optimal condition, (Sec. 10.2) , and 

30 6) determining the efficiency of the affinity 

separation procedure (Sec. 10.3). 

-# 

Part II optimizes separation of a single type of 
GP(IPBD) from a large excess of a single different GP. 
35 The optimum conditions will be used in Part III to 
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separate GP(PBD)s that bind the target from GP(PBD)s 
that do not bind the target. The optimization will be 
at one or more specific temperatures and at one or more 
specific pHs. In Part III, the user must specify the 
5 conditions under which the selected SBD should bind the 
target. If the conditions of intended use differ 
markedly from the conditions for which affinity 
separation was optimized, the user must return to Part 
II and optimize the affinity separation for conditions 
10 similar to the conditions of intended use of the 
selected SBD. 

In Part III, we choose a target material and a 
GP(IPBD) that was developed by the method of Part I and 
15 that is suitable to the target material. Using IPBD as 
the PPBD to the first cycle of variegation, we prepare 
a wide variety of osp— obd genes that encode a wide 
variety of PBDs. We use an affinity separation, 
developed by the method of Part II, to enrich the 
20 population of GP(vgPBD)s for GPs that display PBDs with 
binding properties relative to the target that are 
superior to the binding properties of the PPBD. An SBD 
selected from one variegation cycle becomes the PPBD to 
the next variegation cycle. In a preferred embodiment, 
25 Part III of the process of the present invention 
involves : 

1) picking a target molecule (Sec. 11) , 

30 2) picking a GP(IPBD) (Sec. 12), 

3) picking a set of several residues in the PPBD’ 
to vary based on a) the 3D structure of the IPBD, 
b) sequences of homologous proteins, and c) 
35 computer or theoretical modeling that indicates 



WO 90/02809 


PCT/US89/0373 1 


25 

which residues can tolerate different amino acids 
without disrupting the underlying structure (Sec. 
13.1) , 

5 4) picking a subset of the residues to be varied 

simultaneously based on the number of different 
variants and which variants are within the 
detection capabilities of the affinity separation? 
(Sec. 13.2); 

10 

5) implementing the variegation by: 

a) synthesizing the part of the osp-pbd gene 
that encodes the residues to be varied using a 

15 specific mixture of nucleotide substrates for 

some or all of the bases encoding residues 
slated for variation, thereby creating a 
population of DNA molecules, denoted vgDNA 
(Sec. 13.3) , 

20 

b) ligating this vgDNA, by standard methods, 
into the operative cloning vector (OCV) ( e . q ♦ 
a plasmid or bacteriophage) (Sec. 14.1), 

25 c) using the ligated DNA to transform cells, 

thereby producing a population of trans formed 
cells (Sec. 14.2), 

d) culturing ( i.e. increasing in number) the 

30 population of transformed cells and harvesting 

the population of GP(PBD)s, said population 
being denoted as GP(vgPBD) , (Sec. 14.3), 

e) enriching the population for GPs that bind 

35 the target by using the affinity separation 
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process developed in Part II, with, the chosen 
target molecule as affinity molecule (Sec. 15) , 

f) repeating steps Ill.S.d and Ill.S.e until a 

5 GP(SBD) having improved binding to the target 

is isolated (Sec. 15) , and 

g) testing the isolated SBD or SBDs for 
affinity and specificity for the chosen target 

10 (Sec. 15.8), 

6) repeating steps III. 3, III. 4, and III. 5 until 
the desired degree of binding is obtained. 

15 Part III is repeated for each new target material. 

Part I need be repeated only if no GP(IPBD) suitable to 
a chosen target is available. Part II need be repeated 
for each newly-developed GP(IPBD) and for previously- 
developed GP(IPBD)s if the intended conditions of use 
20 of a. novel binding protein differ significantly from 
the conditions of previous optimizations. 

Sec . 0.2: Abbreviations : 

25 The following abbreviations will be used 

throughout the present invention: 



Abbreviation 

Meanincr 

30 

GP 

Genetic Packaae. e.a. a 



b acter i ophage 


- X 

Any protein 

35 

X 

The gene for protein X 
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IPBD 


Initial Potential Binding 
Domain, e.q. BPTI 


5 PBD 


Potential Binding Domain , e.q. 
a derivative of BPTI 


SBD 


10 


PPBD 


15 


OSP 


20 


OSP— PBD 


25 


OSTS 
GP (x) 


30 


GP(X) 


Successful Binding Domain, 
e»q« a derivative of BPTI 
selected for binding to a 
target 

Parental Potential Binding 
Domain, i . e . an IPBD or an SBD 
from a previous selection 

Outer Surface Protein, e.q. 
coat protein of a phage or 
LamB from E. coli 

Fusion of an OSP and a PBD, 
order of fusion not specified 

Outer Surface Transport Signal 

A genetic package containing 
the x gene 

A genetic package that 
displays X on its outer 
surface 


« 2 } 


An affinity matrix supporting 
"Q" , e.q. {T4 lysozyme} is T4 
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lysozyme attached to an 
affinity matrix 

AfM (W) A molecule having affinity for 

"W", e.q. trypsin is an 
AfM(BPTI) 


XXNDUCE 


10 


ocv 


15 




2 0 DoAMoM 


A chemical that can induce 
expression of a gene, e.q. 

IPTG for the lacPVS promoter 

Operative Cloning Vector 

Kp = [T] [SBD]/ [T:SBD] (T is a 
target) 

% - [N] [SBD]/[N:SBD] (N is a 
non-target ) 

Density of AfM(W) on affinity 
matrix 


Abun(x) 


25 


OMP 


Abundance of DNA molecules 
encoding amino acid x 

Outer membrane protein 


nt 


nucleotide 


30 



A bimolecular dissociation 
constant, % = [A] [B]/[A:B] 


S 


35 


Error level in synthesizing 
vgDNA 
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Sec . 0.3: Standard sequencing method : 

The present invention is not limited to a single 
5 method of determining the sequence of nucleotides (nts) 
in DNA subsequences. Sequencing reactions, agarose gel 
electrophoresis, and polyacrylamide gel electrophoresis 
(PAGE) are performed by standard procedures (AUSU87) . 

10 The present invention is not limited to a single 

method of determining protein sequences, and reference 
in the appended claims to determining the amino acid 
sequence of a domain is intended to include any 
practical method or combination of methods, whether 
15 direct or indirect. The preferred method, in most 
cases , is to determine the sequence of the DNA that 
encodes the protein and then to infer the amino acid 
sequence. In some cases, standard methods of protein- 
sequence determination may be needed to detect post- 
20 translational processing. 

— — — * ★ * — — — 

The major steps in the process of making and 
25 isolating a novel binding protein with affinity for a 
chosen target material are illustrated in Figure 2. 

Sec. 1: Specification of Genetic Package and Means for 

Displaying a Heterologous Binding Domain On Its Outer 
30 Surface: 

Sec. 1.0: General Reguirements for Genetic Packages 

It is emphasized that the GP on which selection- 
35 through-binding will be practiced must be capable. 
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after the selection, either of growth in some suitable 

environment or of in vitro amplification and recovery 

* 

of the encapsulated genetic message. During at least 
part of the growth, the increase in number must be 
5 approximately exponential with respect to time. The 
component of a population that exhibits the desired 
binding properties may be quite small, for example, one 
in 10 4 * 6 or less. Once this component of the population 
is separated from the non-binding components, it must 
10 be possible to amplify it. Culturing viable cells is 
the most powerful amplification of genetic material 
known and is preferred. Genetic messages can also be 
amplified in vitro . but this is not preferred. 

15 A GP may typically be a vegetative bacterial cell, 

a bacterial spore or a bacterial DNA virus. A strain 
of any living cell or virus is potentially useful if 
the strain can be: 

20 1) maintained in culture, 

2) affinity separated and retain its viability, 

3) genetically altered with reasonable facility, 

25 and 

* 

4) manipulated to display the potential binding 
protein domain where it can interact with the 

target material during affinity separation. 

30 

DNA encoding the IPBD sequence may be operably 
linked to DNA encoding at least the outer surface 
transport signal of an outer surface protein (OSP) 
native to the GP so that the IPBD is displayed on the 
35 outer surface of the GP. It should be possible to 
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cause a genetic package to display the IPBD or PBD on 
its outer surface without adversely affecting the 
viability of the GP or the binding characteristics of 
the IPBD or PBD, if the fusion is near domain 
5 boundaries ( BECK83 . CRAW87, TOTH86, SMIT85, MANO 86 ? and 
Cf . ROSS81, HOLL83 ) . 

Those characteristics of a protein that are 
10 recognized by a cell and that cause it to be 

transported out of the cytoplasm and displayed on the 
cell surface will be termed "outer— surface transport 
signals" . 

15 The replicable genetic entity (phage or plasmid) 

that carries the osp-pbd genes (derived from the osp- 
ipbd gene) through the selection-through-binding 
process, see Sec. 14, is referred to hereinafter as the 
. operative cloning vector (OCV) . When the OCV is a 
20 phage, it may also serve as the genetic package. The 

choice of a GP is dependent in part on the availability 
of a suitable OCV and suitable OSP. 

Preferably, the GP is readily stored, for example, 
25 by freezing. If the GP is a cell, it should have a 

short doubling time, such as 20-40 minutes. If the GP 
is a virus, it should be prolific, e.g. , a burst size 
of at least 100/ infected cell. GPs which are finicky 
or expensive to culture are disfavored. The GP should 
30 be easy to harvest, preferably by centrifugation. The 

GP is preferably stable for a temperature range of -70 
to 42°C (stable at 4°C for several days or weeks) ; 
resistant to shear forces found in HPLC; insensitive to 
UV; tolerant of desiccation; and resistant to a pH of 
35 2.0 to 10.0, surface active agents such as SDS or 
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Triton , chaotropes such as 4M urea or 2M guanidinium 
HC1, common ions such as K*, Na + , and SO 4 , common 
organic solvents such as ether and acetone, and 
degradative enzymes. Finally, there must be a suitable 
5 OCV (see Sec. 3) . 

Preferably, the 3 D structure of the OSP, and the 
sequence of the OSP gene p. 47 are known. If the 3D 
structure is not known, there is preferably knowledge 
10 of which residues are exposed on the cell surface, the 
location of the domain boundaries within the OSP, 
and/or of successful fusions of the OSP and a foreign 
insert. The OSP preferably appears in numerous copies 
on the outer surface of the GP, and preferably serves a 
15 non-essential function. It is desirable that the OSP 
not be post translationally processed, or at least that 
this processing be understood. 

The preferred GP, OCV and OSP are those for which 
20 the fewest serious obstacles can be seen, rather than 
the one that scores highest on any one criterion. 

Next, we consider general answers to the questions 
posed in this step for the cases of: a) vegetatively 

25 growing bacterial cells (Sec. 1.1) , b) bacterial spores 
(Sec. 1.2) , and c) (Sec. 1.3) . Preferred OSPs for 
several GPs are given in Table 2 . 

Sec. 1.1: Bacterial Cells as Genetic Packages: 

30 

One may choose any well-characterized bacterial 
strain which may be grown in culture. The important 
questions in this case are: a) do we know enough about 
mechanisms that localize proteins on the outside of the 
cell, b) will the XPBD fold in the environment of the 


35 
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outer membrane, and c) will cells change expression of 
osp-pbd . derived from osp-ipbd . during affinity 
separation? Some IPBDs may need large or insoluble 
prosthetic groups, such as an Fe 4 S 4 cluster, that are 
5 available within the cell, but not in the medium. The 
formation of Fe 4 S 4 clusters found in some ferrodoxins 
is catalyzed by enzymes found in the cell (B0N085) . 
IPBDs that require such prosthetic groups may fail to 
fold or function if displayed on bacterial cells. 

10 

Sec. 1.1.1: Preferred Bacterial Cells as GP : 

In view of the extensive knowledge of ILt. coli, a 
strain of 33a. coli . defective in recombination, is the 
15 strongest candidate as a bacterial GP. Other preferred 
candidates are Salmonella tvphimurium . Bacillus 
subtilis . and Pseudomonas aeruginosa . 

Sec. 1.1.2: Preferred Outer Surface Proteins for 

20 Displaying IPBDs on Bacterial Cells: 

Gram-negative bacteria have outer-membrane 
proteins (OMP) , that form a subset of OSPs. Many OMPs 
span the membrane one or more times. The signals that 
25 cause OMPs to localize in the outer membrane are 

encoded in the amino acid sequence of the mature 

protein. Fusions of fragments of omp genes with 

fragments of an £ gene have led to X appearing on the 
outer membrane (BENS84 , CLEM81) . If no fusion data are 
30 available, then we fuse an iobd fragment to various 

fragments of the osp gene and obtain GPs that display 
the osp-ipbd fusion on the cell outer surface by 
screening or selection for the display-of-IPBD 
phenotype . 


35 
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Oliver has reviewed mechanisms of protein 
secretion in bacteria (0LIV85 and OLIV87) . Nikaido and 
Vaara (NIKA87) have reviewed mechanisms by which 
proteins become localized to the outer membrane of 
5 Gram-negative bacteria. For example, the LamB protein 
of Ej. coli is synthesized with a typical signal- 
sequence which is subsequently removed. Benson et al . 
(BENS84) showed that LamB— LacZ fusion proteins would be 
deposited in the outer membrane of E^. coli when 
10 residues 1-49 of the mature LamB protein are included 
in the fusion, but that residues 1-43 are insufficient. 

LamB of Ej. coli is a porin for maltose and 

maltodextrin transport, and serves as the receptor for 

15 adsorption of bacteriophages lambda and K10. This 

protein has been purified to homogeneity (ENDE78) and 
shown to function as a trimer (PALV79) . Mutations to 
phage resistance have been used to define the parts of 
the LamB protein that adsorb each phage (ROAM80, 
20 CLEM81, CLEM83, GEHR87) . 

Topological models have been developed that 
describe the function of phage receptor and 
maltodextrin transport. The models describe these 
25 domains and their locations with respect to the 

surfaces of the outer membrane (CLEM81, CLEM8 3 , CHAR84, 
HEIN88) . 

LamB is transported to the outer membrane if a 
30 functional N— terminal sequence is present; further, the 
first 49 amino acids of the mature sequence are 
required for successful transport (BENS84) . Homology 
between parts of LamB protein and other outer membrane 
proteins OmpC, OmpF and PhoE has been detected 
(NIKA84) , including homology between LamB amino acids 


35 
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39-49 and sequences of the other proteins. These 
subsequences may label the proteins for transport to 
the outer membrane. Further, monoclonal antibodies 
derived from mice immunized with purified LamB, have 
5 been used to characterize four distinct topological and 
functional regions , two of which are concerned with 
maltose transport (GABA82) . 

Sec. 1.1.3 Choice of Insertion site for IPBD in 

10 Bacterial Cell OSP: 

For fusions of the phoA into the coding sequence 
for an integral membrane protein, the PhoA domain is 
localized according to where in the integral membrane 
15 protein the phoA gene was inserted (BECK83 and MAN086) 
That is, if phoA is inserted after an amino acid which 
normally is found in the cytoplasm, then PhoA appears 
in the cytoplasm. If phoA is inserted after an amino 

acid normally found in the periplasm, however, then the 

* 

2 0 PhoA domain is localized on the periplasmic side of the 

membrane, and anchored in it. Beckwith and colleagues 
(BECK88) have extended these observations to the lacz 
gene that can be inserted into genes for integral 
membrane proteins such that the LacZ domain appears in 
25 either the cytoplasm or the periplasm according to 
where the lacz gene was inserted. 

OSP-IPBD fusion proteins need not fill a 
structural role in the outer membranes of Gram-negative 
30 bacteria because parts of the outer membranes are not 
highly ordered. For large OSPs there is likely to be 
one or more sites at which osp can be truncated and 
fused to iobd such that cells expressing the fusion 
will display IPBDs on the cell surface. If fusions 
between fragments of osp and x have been shown to 
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display X on the cell surface, we can design an osp— 
ipbd gene by substituting ipbd for x in the DNA 
sequence. Otherwise, successful OMP-IPBD fusion is 
preferably sought by fusing fragments of the best omp 
5 to an ipbd, expressing the fused gene, and testing the 
resultant GPs for display-of-IPBD phenotype. We use 
the available data about OMP to pick the point or 
points of fusion between omp and ipbd to maximize the 
likelihood that IPBD will be displayed. Alternatively, 

10 we truncate osp at several sites or in a manner that 
produces osp fragments of variable length and fuse the 
osp fragments to ipbd ; cells expressing the fusion are 
screened or selected which display IPBDs on the cell 
surface. An additional alternative is to include short 

15 segments of random DNA in the fusion of omp fragments 
to ipbd and then screen or select the resulting 
variegated population for members exhibiting the 
display-of — IPBD phenotype . 

20 The promoter for the osp -ipbd gene, preferably, is 

subject to regulation by a small chemical inducer, such 
as isopropyl thiogalactoside (IPTG) f lac UV5 promoter) . 
It need not come from a natural osp gene ; any 
regulatable bacterial promoter can be used (MANI82) . 

25 

Once a genetic packaging system employing 
vegetative bacterial cells has been designed, it is 
time to choose an IPBD (Sec. 2) . 

3° Sec . 1.1.4: In Vivo Selection for Pseudo— osp Gene From 

Random DNA Inserts in Bacterial Cells: 

As an alternative to choosing a natural OSP and an 
insertion site in the OSP, we can construct a gene 
comprising; a) a regulatable promoter f e.a. lacUV5) , b) 
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a Shine-Dalgarno sequence, c) a periplasmic transport: 
signal sequence, d) a fusion of the iobd gene with a 
segment of random DNA (as in Kaiser et al . (KAIS87)), 

e) a stop codon, and f) a transcriptional terminator. 

5 The random DNA, which preferably comprises 90-300 

bases, encode numerous potential OSTS. (EF. KAIS87) 
The fusion of iobd and the random DNA could be in 
either order, but iobd upstream is slightly preferred. 
Isolates from the population generated in this way can 
10 be screened for display of the IPBD. Preferably, a 

version of selection-through-binding is used to select 
GPs that display IPBD on the GP surface, and thus 
contain a DNA insert encoding a functional OSTS . 
Alternatively, clonal isolates of GPs may be screened 
15 for the display-of-IPBD phenotype. 

The preference for iobd upstream of the random DNA 
arises from consideration of the manner in which the 
successful GP(IPBD) will be used. In Part ill, we will 
20 introduce numerous mutations into the pbd region of the 
osp-obd gene, some of which might include gratuitous 
stop codons. If pbd precedes the random DNA, then 
gratuitous stop codons in pbd lead to no OSP-PBD 
protein appearing on the cell surface. If pbd follows 
25 the random DNA, then gratuitous stop codons in pbd 

might lead to incomplete OSP-PBD proteins appearing on 
the cell surface. incomplete proteins often are non- 
specifically sticky so that GPs displaying incomplete 
PBDs are easily removed from the population. 

30 

Sec. 1.2: Displaying IPBD on bacterial spores: 

Bacterial spores have desirable properties as GP 
candidates. Bacillus spores neither actively 
35 metabolize nor alter the proteins on their surface. 
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However, spores are much more resistant than vegetative 
bacterial cells or phage to chemical and physical 
agents - Spores have the disadvantage that the 

molecular mechanisms that trigger sporulation are less 
5 well worked out than is the formation of M13 or the 
export of protein to the outer membrane of E. coli . 

Sec. 1.2.1.: Preferred Bacterial Spores for Use as GPs: 

10 Bacteria of the genus Bacillus form endospores 

that are extremely resistant to damage by heat, 
radiation, desiccation, and toxic chemicals (reviewed 
by losick et al. (LOSX86) ) . These spores have complex 
structure and morphogenesis that is species-specific 
15 and only partially elucidated. The following 

observations are relevant to the use of Bacillus spores 
as genetic packages. 

Plasmid DNA is commonly included in spores. 
20 Plasmid encoded proteins have been observed on the 
surface of Bacillus spores (DEBR86) . Sporulation 
involves complex temporal regulation that is moderately 
well understood (LOSI86) . The sequences of several 
sporulation promoters are known; coding sequences 
25 operatively linked to such promoters are expressed only 

during sporulation (RAYC87) . 

Donovan et al . have identified several polypeptide 
components of B^_ subtil is spore coat (D0N087) ; the 
30 sequences of two complete coat proteins and amino- 

terminal fragments of two others have been determined. 
Some components of the spore are synthesized in the 
forespore, e.q. small acid-soluble spore proteins 
(ERRI88) , while other components are synthesized in the 
35 mother cell and appear in the spore ( e.q. the coat 
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proteins) . This spatial organization of synthesis is 
controlled at the transcriptional level. 

Spores self —assemble, but the signals that cause 
5 various proteins to localize in different parts of the 
spore are not well understood; presumably , the signals 
controlling deposition of the coat proteins from the 
cytoplasm of the mother cell onto the spore coat are 
embedded in the polypeptide sequence. Some, but not 
10 all, of the coat proteins are synthesized as precursors 
and are then processed by specific proteases before 
deposition in the spore coat (D0N087) . Viable spores 
that differ only slightly from wild-type are produced 
in B. snbti 1 is even if any one of four coat proteins is 
15 missing (D0N087) . Disulfide bonds form within the 
spore (thiol reducing agents are needed to solubilize 
several of the proteins of the coat) . The 12kd coat 
protein, CotD, contains 5 cysteines. CotD also 

contains an unusually high number of histidines (16) 
20 and prolines (7) . The llkd coat protein, Cote, 
contains only one cysteine and one methionine. CotC 
has a very unusual amino-acid sequence with 19 lysines 
(K) appearing as 9 K— K dipeptides and one isolated K. 
There are also 20 tyrosines (Y) of which 10 appear as 5 
25 Y-Y dipeptides. Peptides rich in Y and K are known to 
become crosslinked in oxidizing environments (DEV078, 
WAIT83, WAIT86) . CotC contains 16 D and E amino acids 
that nearly equals the 19 Ks. There are no A, F, R, I, 
L, N, P, Q, S, or W amino acids in CotC. Neither CotC 
30 nor CotD is post-translationally cleaved. The proteins 
CotA and CotB are post— translationally cleaved. 

Enddspores from the genus Bacillus are more stable 
than are exospores from Streptomvces . Bacillus 

35 subtil is forms spores in 4 to 6 hours, but Streptomvces 
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species may require days or weeks to sporulate. In 
addition, genetic knowledge and manipulation is much 
more developed for B^. subtil is than for other spore- 
forming bacteria. Thus Bacillus spores are preferred 
5 over Streptomvces spores. Bacteria of the genus 
Clostridium also form very durable endospores, but 
Clostridia, being strict anaerobes, are not convenient 
to culture. The choice of a species of Bacillus is 
governed by knowledge and availability of cloning 
10 systems and by how easily sporulation can be 
controlled. A particular strain is chosen by the 
criteria listed in Sec. 1.0. Many vegetative 
biochemical pathways are shut down when sporulation 
begins so that prosthetic groups might not be 
15 available. 


1.2.2 Preferred outer-surface proteins for 

Displaying IPBD on Bacterial Spores* 


20 If a spore is chosen as GP, the promoter is the 

most important part of the osp gene, because the 
promoter of a spore coat protein is most active: a) 

when spore coat protein is being synthesized and 
deposited onto the spore and b) in the specific place 
25 that spore coat proteins are being made. In B . 
sufa'tilis » some of the spore coat proteins are post- 
translationally processed by specific proteases . It is 
valuable to know the sequences of precursors and mature 
coat proteins so that we can avoid incorporating the 
30 recognition sequence of the specific protease into our 
construction of an OSP— IPBD fusion. The sequence of a 
mature spore coat protein contains information that 
causes the protein to be deposited in the spore coat? 
thus gene fusions that include some or all of a mature 
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coat: protein sequence are preferred for screening or 
selection for the display-of-IPBD phenotype. 

Fusions of ipbd fragments to cot C or cotD 
5 fragments are likely to cause IPBD to appear on the 
spore surface. The genes cote and cotD are preferred 
osp genes because CotC and CotD are not post- 
translationally cleaved. Subsequences from cotA or 
cotB could also be used to cause an IPBD to appear on 
10 the surface of Bj. subtil is spores, but we must take the 
post-translational cleavage of these proteins into 
account. DNA encoding IPBD could be fused to a 
fragment of cot A or cotB at either end of the coding 
region or at sites interior to the coding region. 
15 Spores could then be screened or selected for the 

display-of-IPBD phenotype. 

To date, no Bacillus sporulation promoter has been 
shown to be inducible by an exogenous chemical inducer 
20 as the lac promoter of |U_ coll . Nevertheless, the 

quantity of protein produced from a sporulation 
promoter can be controlled by other factors, such as 
the DNA sequence around the Shine-Dalgarno sequence or 
codon usage. 

25 

Sec. 1.2.3: Choice of Insertion site for IPBD in OSP 

of Bacterial Spore; 

The considerations governing insertion site in the 
30 spore OSP are the same as those given in Section 1.1.3. 

Sec. 1.2.4; In Vivo Selection for Pseudo-osp Genes 

From Random DNA Inserts in Bacterial Spores: 
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Although the considerations for spores are nearly 
identical to the considerations for vegetative 
bacterial cells (Sec. 1.1), the available information 
on the mechanisms that cause proteins to appear on 
5 spores is meager so that use of the random-DNA approach 
becomes a more attractive option. 

We can use the approach described above at 1.1.4 
for attaching an IPBD to an IL. coli cell, except that: 
10 a) a sporulation promoter is used, and b) no 
periplasmic signal sequence should be present. 

Sec. 1.3: Displaying- IPBD on Outer Surface of Phacres: 

15 Sec. 1.3.1: Preferred Phacres for Use as GPs: 

Unlike bacterial cells and spores, choice of a 
phage depends strongly on knowledge of the 3D structure 
of an OSP and how .it interacts with other proteins in 
20 the capsid. The size of the phage genome and the 
packaging mechanism are also important because the 
phage genome itself is the cloning vector. The osp- 
tpbd gene must be inserted into the phage genome; 
therefore : 

25 

1) the virion must be capable of accepting the 

insertion or substitution of genetic material, and 

2) the genome of the phage must be small enough to 

30 allow convenient manipulation. 

Additional considerations in choosing phage are: 1) 

the morphogenetic pathway of the phage determines the 
environment in which the IPBD will have opportunity to 
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fold, 2) XPBDs containing essential disulfides may not 
fold within a cell, 3) XPBDs needing large or insoluble 
prosthetic groups may not fold if secreted because the 
prosthetic group is lacking, and 4) when variegation is 
5 introduced in Part III, multiple infections could 

generate hybrid GPs that carry the gene for one PBD but 
have at least some copies of a different PBD on their 
surfaces; it is preferable to minimize this 
possibility . 

10 

Bacteriophages are excellent candidates for GPs 
because there is little or no enzymatic activity 
associated with intact mature phage, and because the 
genes are inactive outside a bacterial host, rendering 
15 the mature phage particles metabolically inert. The 
filamentous phage M13 and bacteriophage PhiX174 are of 
particular interest. 

Filamentous phage r 

20 

The entire life cycle of the filamentous phage 
M13, a common cloning and sequencing vector, is well 
understood. M13 and fl are so closely related that we 
consider the properties of each relevant to both 

25 (RASC86) ; any differentiation is for historical 
accuracy. The genetic structure (the complete sequence 
(SCHA78) , the identity and function of the ten genes, 
and the order of transcription and location of the 
promoters) of M13 is well known as is the physical 

30 structure of the virion (BANN81, B0EK80, CHAN79 , 

ITOK79, KAPL78, KUHN85b, KUHN87 , M&K080, MARV78, 
MESS78, OHKA81, RASC86, RUSS81, SCHA78, SMIT85, WEBS78, 
and ZXMM82) ? see RASC86 for a recent review of the 
structure and function of the coat proteins. 


35 
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Relevant facts about M13 are disclosed in Example 


Bacteriophage PhiX174 ; 

5 

The bacteriophage PhiX174 is a very small 
icosahedral virus which has been thoroughly studied by 
genetics , biochemistry, and electron microscopy (See 
The Single-Stranded DNA Phages (DENH78) ) . To date, no 
10 proteins from PhiX174 have been studied by X— ray 

taction* PhiX17 4 is not used as a cloning vector 

because PhiX174 can accept almost no additional DNA; 
the virus is so tightly constrained that several of its 
genes overlap. Chambers et al. (CHAM82) showed that 
15 mutants in gene G are rescued by the wild— type G gene 

carried on a plasmid so that the host supplies this 
protein. 



Three gene products of PhiXl74 are present on the 
outside of the mature virion; F (capsid) , G (major 
spike protein, 60 copies per virion) , and H ( mi n or 
spike protein, 12 copies per virion) . The G protein 
comprises 175 amino acids, while H comprises 328 amino 
acids. The F protein interacts with the single- 
stranded DNA of the virus. The proteins F, G, and H 
are translated from a single mSNA in the viral infected 


Large DNA Phages 

Phage such as la mb da or T4 have much larger 
genomes than do M13 or PhiX174. Large genomes are less 
conveniently manipulated than small genomes. -A phage 
with a large genome, however, could be used if genetic 
manipulation is sufficiently convenient. Phage such as 
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lambda and T4 have more complicated 3D capsid 
structures than M13 or PhiX174, with more OSPs to 
choose from. Phage lambda virions and phage T4 virions 
form intracellularly , so that IPBDs requiring large or 
5 insoluble prosthetic groups might fold on the surfaces 
of these phage. Phage lambda and phage T4 are not 
preferred, however, derivatives of these phages could 
be constructed to overcome these disadvantages. 

10 RNA Phages 

RNA phage, such as Qbeta, are not preferred 
because manipulation of RNA is much less convenient 
than is the manipulation of DNA. Although competent 
15 RNA bacteriophage are not preferred, useful genetically 
altered RNA-containing particles could be derived from 
RNA phage, such as MS 2 . 

To use MS2 as a SP, we would need to eliminate 
20 most of the natural viral genome so that an osp-ipbd 
gene could fit into the protein capsid. It is known 
that the A protein binds sequence-specifically to a 
site at the 5 ' end of the + RNA strand triggering 
formation of RNA-containing particles if coat protein 
25 is present. If a message containing the A protein 
binding site and the gene for a chimera of coat protein 
and a PBD were produced in a cell that also contained A 
protein and wild-type coat protein (both produced from 
regulated genes on a plasmid) , then the RNA coding for 
the chimeric protein would get packaged. A package 
comprising RNA encapsulated by proteins encoded by that 
RNA satisfies the major criterion that the genetic 
message inside the package specifies something on the 
outside. The particles by themselves are not viable. 


30 
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After isolating the packages that carry an SBD, we 
would need to: 


5 


1) separate the RNA from the protein capsid, 

2) reverse transcribe the RNA into DNA, using AMV 
or MMTV reverse transcriptase, and 


3) amplify the DNA by several cycles of polymerase 
10 chain reaction (PCR) until there is enough to 

subclone the recovered genetic message into a 
plasmid for sequencing and further work. 

Alternatively, helper phage could be used to rescue the 
15 isolated phage. 


Sec . 1.3.2: Preferred Outer-Surface Proteins for 

Displaying XPBDs on Phages: 


20 For a given bacteriophage, the preferred OSP is 

usually one that is present on the phage surface in the 
largest number of copies, as this allows the greatest 
flexibility in varying the ratio of OSP-IPBD to wild 
type OSP and also gives the highest likelihood of 
25 obtaining satisfactory affinity separation. Moreover, 
a protein present in only one or a few copies usually 
performs an essential function in morphogenesis or 
infection; mutating such a protein by addition or 
insertion is likely to result in reduction in viability 
30 of the GP. 


It is preferred that the wild-type osp gene be 
preserved. The iobd gene fragment may be inserted 
either into a second copy of the recipient osp gene or 
into a novel engineered osp gene. The preferred OSP 


35 



WO 90/02809 


PCT/US89/03731 


47 

for use when the GP is M13 is the gene III protein (see 
Example 1) . 

Sec. 1, 3.3; Choice of Insertion site for TPBD in OSP; 

5 

The user must choose a site in the candidate OSP 
gene for inserting a ipbd gene fragment. The coats of 
most bacteriophage are highly ordered. Thus in 

bacteriophage , unlike the cases of bacteria and spores, 
10 it is important to retain most or all of the residues 
of the parental OSP in engineered OSP-IPBD fusion 
proteins. A preferred site for insertion of the ipbd 
gene into the phage osp gene is one in which: a) the 
IPBD folds into its original shape, b) the OSP domains 

15 fold into their original shapes, and c) there is no 

interference between the two domains. 

If there is a 3D model of the phage that indicates 
that either the amino or carboxy terminus of an OSP is 
20 exposed to solvent, then the exposed terminus of that 
mature OSP becomes the prime candidate for insertion of 
the ipbd gene. A low resolution 3D model suffices. 

In the absence of a 3D structure, the amino and 
25 carboxy termini of the mature OSP are the best 

candidates for insertion of the ipbd gene. A 

functional fusion may require additional residues 
between the IPBD and OSP domains to avoid unwanted 
interactions between the domains . Random-sequence DNA 
or DNA coding for a specific sequence of a protein 
homologous to the IPBD or OSP, can be inserted between 
the osp fragment and the ipbd fragment if needed. 

Fusion at a domain boundary within the OSP is also 
a good approach for obtaining a functional fusion. 


35 
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Smith exploited such a. boundary when sub cloning 
heterologous DNA into gene III of fl (SMIT85) . 


There are several methods of identifying domains. 
5 Methods that rely on atomic coordinates have been 
reviewed by Janin and Chothia (JANI85) see also ROSE85, 
RASH84, VITA84, PAB079 , POTE83 , and SCOT87. 

If the only structural information available is 
10 the amino acid sequence of the candidate OSP, we use 
the sequence to predict turns and loops. There is a 
high probability that some of the loops and turns will 
be correctly predicted f cf . Chou and Fasman , (CH0U72) ) ; 

these locations are also candidates for insertion of 
15 the ipbd gene fragment. 

Sec. 1.3.4: In Vivo Selection for Pseudo-OSP Gene from 

Random DNA Inserts in Bacterial Spores: 

Alternatively, a functional insertion site may be 
determined by generating a number of recombinant 
constructions and selecting the functional strain by 
phenotypic characteristics. Because the OSP-IPBD must 
fulfill a structural role in the phage coat, it is 
unlikely that any particular random DNA sequence 
coupled to the ipbd gene will produce a fusion protein 
that fits into the coat in a functional way. 
Nevertheless , random DNA inserted between large 
fragments of a coat protein gene and the ipbd gene will 
produce a population that is likely to contain one or 
more members that display the IPBD on the outside of a 
viable phage. A display probe, similar to that defined 
fu 1.1.4, is constructed and random DNA sequences 
cloned into appropriate sites. 


20 


25 


30 


35 
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Sec. 2: Choice of tpbd ; 

An IPBD may be chosen from naturally occurring 
proteins or domains of naturally occurring proteins, or 
5 may be designed from first principles. a designed 
protein may have advantages over natural proteins if: 
a) the designed protein is more stable, b) the designed 
protein is smaller, and c) the charge distribution of 
the designed protein can be specified more freely. 

10 

A candidate IPBD must meet the following criteria: 
1) stablility under the conditions of its intended use 
(the domain may comprise the entire protein that will 
be inserted, e_ . _ g. BPTI) , 2) knowledge of the amino acid 
15 sequence is obtainable, 3) identification of the 
residues on the outer surface, and their spatial 
relationships, and 4) availability of a molecule, 
AfM(IPBD) having high specific affinity for the IPBD. 

* 20 Preferably, the IPBD is no larger than necessary 

because it is easier to arrange restriction sites in 

smaller amino-acid sequences. The usefulness of 

candidate IPBDs that meet all of these requirements 

depends on the availability of the information 
25 discussed below. 

Information used to judge IPBD suitability 
includes: l) a 3D structure (knowledge strongly 

' 2) one or more sequences homologous to the 
30 IPBD (the more homologous sequences known, the better) , 

3) the pi of the IPBD (knowledge necessary in some 
cases) , 4) the stability and solubility as a function 
of temperature, pH and ionic strength (preferably known 
to be stable over a wide range and soluble in 
35 conditions of intended use), 5) ability to bind metal 



WO 90/02809 


PCT/US89/03731 


50 

ions such as Ca ++ or Mg ++ (knowledge preferred; binding 
per se . no preference) , 6) enzymatic activities, if any 
(knowledge preferred, activity per se has uses but may 
cause problems) , 7) binding properties, if any 

5 (knowledge preferred, specific binding also preferred) , 
8) availability of a molecule having specific and 
strong affinity ( < 10“^ M) for the IPBD 

(preferred) , 9) availability of a molecule having 

specific and medium affinity ( 10” 8 M < < 10 -6 M) 

10 for the XPBD (preferred) , 10) the sequence of a mutant 
of XPBD that does not bind to the affinity molecule (s) 
(preferred) , and 11) absorption spectrum in visible, 
UV, NMR, etc. (characteristic absorption preferred) . 

15 If only one species of molecule having affinity 

for XPBD (AfM(IPBD) ) is available, it will be used to; 
a) detect the IPBD on the GP surface, b) optimize 
expression level and density of the affinity molecule 
on the matrix (Sec. 10.1), and c) determine the 
20 efficiency and sensitivity of the affinity separation 
(Secs. 10.2 and 10.3). As noted above, however, one 
would prefer to have available two species of 
AfM(IPBD) , one with high and one with moderate affinity 
for the IPBD. The species with high affinity would be 
25 used in initial detection and in determining efficiency 
and sensitivity (10.2 and 10.3) , and the species with 
moderate affinity would be used in optimization (10.1). 

For at least 20 candidate XPBDs the above 
30 information is available or is practical to obtain, for 
example, bovine pancreatic trypsin inhibitor (BPTI, 58 
residues) , crambin (46 residues) , third domain of 
ovomucoid (56 residues) , T4 lysozyme (164 residues) , 
and azurin (128 residues) . 

35 
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Most: of the PBOs derived from a PPBD according to 
the process of the present invention affect residues 
having side groups directed toward the solvent. 

Exposed residues can accept a wide range of amino 

5 acids, while buried residues are more limited in this 
regard (REID88) . Surface mutations typically have only 
small effects on melting temperature of the PBD, but 
may reduce the stability of the PBD. Hence the chosen 
IPBD should have a high melting temperature (60°C 

10 acceptable, the higher the better) and be stable over a 
wide pH range (8.0 to 3.0 acceptable; 11.0 to 2.0 
preferred) , so that the SBDs derived from the chosen 
IPBD by mutation and selection-through-binding will 

retain sufficient stability. Preferably, the 

15 substitutions in the IPBD yielding the various PBDs do 
not reduce the melting point of the domain below 50°C. 

Two general characteristics of the target 
molecule, size and charge, make certain classes of 
20 IPBDs more likely than other classes to yield 

derivatives that will bind specifically to the target. 
Because these are very general characteristics, one can 
divide all targets into six classes: a) large positive, 
b) large neutral, c) large negative, d) small positive, 
25 e) small neutral, and f) small negative. A small 

collection of IPBDs, one or a few corresponding to each 
class of target, will contain a preferred candidate 

IPBD for any chosen target. 

30 Alternatively , the user may elect to engineer a 

GP(IPBD) for a particular target; Sec 2.1 gives 

criteria that relate target size and charge to the 
choice of IPBD. 

35 Sec. 2.1: Influence of target size on choice of IPBD: 



WO 90/02809 


PCT/US89/03731 


52 

If the target is a protein or other macromolecule 

a preferred embodiment of the IPBD is a small protein 

such as BPTI from Bos taurus (58 residues) , crambin 

5 from rape seed (46 residues) , or the third domain of 

ovomucoid from Coturnix coturnix Japonica (Japanese 

quail) (56 residues) (PAPA82) , because targets from 

this class have clefts and grooves that can accommodate 

small proteins in highly specific ways. If the target 

10 is a macromolecule lacking a compact structure , such as 

starch, it should be treated as if it were a small 

molecule. Extended macromolecules with defined 3D 

» 

structure, such as collagen, should be treated as large 
molecules. 

15 

If the target is a small molecule, such as a 
steroid, a preferred embodiment of the IPBD is a 
protein the size of ribonuclease from Bos taurus (124 
residues) , ribonuclease from Asoeroillus orvzae (104 
20 residues) , hen egg white lysozyme from Gallus callus 
(129 residues) , azurin from Pseudomonas aeruginosa (128 
residues) , or T4 lysozyme (164 residues) , because such 
proteins have clefts and grooves into which the small 
target molecules can fit. The Brookhaven Protein Data 
25 Bank contains 3D structures for these proteins. Genes 
encoding proteins as large as T4 lysozyme can be 
manipulated by standard techniques for the purposes of 
this invention. 

30 If the target is a mineral, insoluble in water, 

one must consider the nature of the mineral's molecular 
surface. Smooth surfaces, (such as crystalline 
silicon) require medium to large proteins (such as 
ribonuclease) as IPBD in order to have sufficient 
contact area and specificity. Rough, grooved surfaces 
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(zeolites) , could be bound either by small proteins 
(BPTI) or larger proteins (T4 lysozyme) . 

Sec . 2.2: Influence of target charge on choice of 

5 IPBD: 

Electrostatic repulsion between molecules of like 

charge can prevent molecules with highly complementary 

surfaces from binding. Therefore, it is preferred 

10 that, under the conditions of intended use, the IPBD 

and the target molecule either have opposite charge or 

that one of them is neutral. Inclusion of counter ions 

* 

can reduce or eliminate electrostatic repulsion. 

15 Sec. 2.3; Other aspects of choice of IPBD.: 

If the chosen IPBD is an enzyme, it may be 
necessary to change one or more residues in the active 
site to inactivate enzyme function. For example, if 
20 the IPBD were T4 lysozyme and the GP were Ej. coli cells 
or M13 , we would inactivate the lysozyme lest it lyse 
the cells. If, on the other hand, the GP were PhiX174, 
then inactivation of lysozyme may not be needed because 
T4 lysozyme can be overproduced inside Ej. coli cells 
25 without detrimental effects and PhiX174 forms 
intracellularly . It is preferred to inactivate enzyme 
IPBDs that might be harmful to the GP or its host by 
substituting mutant amino acids at one or more residues 
of the active site. It is permitted to vary one or 
30 more of the residues that were changed to abolish the 
original enzymatic activity of the IPBD. Those GPs 
that receive oso-pbd genes encoding an active enzyme 
may die, but the majority of sequences will not be 
deleterious . 


35 
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Sec. 3; Choice of OCV: 


The OCV is preferably small, e.g. , less than 10 
KB. It is desirable that cassette mutagenesis be 
5 practical in the OCV; preferably, at least 25 
restriction enzymes are available that do not cut the 
OCV. Xt is likewise desirable that single-stranded 
mutagenesis be practical. Finally, the OCV preferably 
carries a selectable marker. A suitable OCV is 
10 obtained or is engineered by manipulation of available 
vectors . Plasmids are preferred over the bacterial 
chromosome because genes on plasmids are much more 
easily constructed and mutated than are chromosomal 
genes. When bacteriophage are to be used, the osp-ipbd 
15 gene must be inserted into the phage genome. 

For phage such as M13 , an antibiotic resistance 
gene is engineered into the genome (HINE80) . More 
virulent phage, such as PhiX174, make discernable 
20 plagues that can be picked, in which case a resistance 
gene is not essential; furthermore, there is no room in 
the PhiX174 virion to add any new genetic material. 
Inability to include an antibiotic resistance gene is a 
disadvantage because it limits the number of GPs that 
25 can be screened. 

Xt is preferred that GP(XPBO) carry a selectable 
marker not carried by wtGP. It is also preferred that 
wtGP carry a selectable marker not carried by GP(IPBD) . 
30 

Sec. 4: Designing the qsp- fpbd crene insert: 

We design an amino acid sequence that will cause 
35 the IPBD to appear on the GP surface when it is 
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expressed. This amino acid sequence may determine the 
entire coding region of the osp-ipbd gene, or it may 
contain only the ipbd sequence adjoining restriction 
sites into which random DNA will be cloned (Sec. 6.2) . 

5 

The actual gene may be produced by any means. The 
obd segment, derived from the ipbd segment, must be 
easily genetically manipulated in the ways described in 
Part III. Synthetic ipbd segments are preferred 
10 because they allow greatest control over placement of 
restriction sites. 

Sec. 4.1 Genetic regulation of the osp-ipbd gene: 

15 Regarding regulation of the osp-ipbd gene, the two 

important questions are: a) how much OSP-XPBD do we 

need on each GP, and b) how accurately must we regulate 
the amount? 

20 The essential function of the affinity separation 

is to separate GPS that bear PBDs (derived from IPBD) 
having high affinity for the target from GPs bearing 
PBDs having low affinity for the target. If a gradient 
of some solute, such as increasing salt, changes the 

25 conditions, then all weakly-binding PBDs will cease to 

* 

bind before any strongly-binding PBDs cease to bind. 
Regulation of the osp-nbd gene must be such that all 
packages display sufficient PBD to effect a good 
separation in Sec 15 . If the amount of PBD/GP had an 
30 effect on the elution volume of the GP from the 
affinity matrix, then we would need to regulate the 
amount of PBD/GP accurately. The following analysis 
shows that there is no strong linear effect of IPBD/GP 
on elution volume and assumes only: a) that all GPs are 
35 the same size, b) that interactions between the PBDs 
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and the affinity matrix dominate differential elution 
of GPs, c) that the system is at equilibrium, and d) 
that all PBDs on any one GP are identical. 

5 If Np identical PBDs on a GP each have access to 

target molecules, and each PBD has a free-energy of 
binding to the target of delta G b , then the total free 
energy of binding is 

10 delta G b tot = N p * delta G b . 

Delta G b is a function of parameters of the solvent, 
such as: l) concentration of ions, 2) pH, 3) 
temperature, 4) concentration of neutral solutes such 
15 as sucrose, glucose, ethanol, etc. . 5) specific ions, 

such as, calcium, acetate, benzoate, nicotinate, etc. 
If conditions are altered during affinity separation so 
that delta G b approaches zero , delta G b ^°^ approaches 
zero Np times faster. As delta G b ^°^ goes to or above 
20 zero, the packages will dissociate from the immobilized' 
target molecules and be eluted. 


25 


30 


GPs bearing more PBDs have a sharper transition 
between bound and unbound than packages with fewer of 
the same PBDs. For equilibrium conditions, the mid- 
point of the transition is determined only by the 
solution conditions that bring the individual 
interactions to zero free-energy. The number of 
PBDs/GP determines the sharpness of the transition. 

It should also be noted that the number of PBDs/GP 
is usually influenced by physiological conditions so 
that a sample of genetically identical GP(PBD)s may 
contain GPs having different numbers of PBDs on the GP 
surface. In a population of GP(vgPBD)s each PBD 
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sequence will appear on more that one GP, and the 
actual number of PBDs/GP will vary from GP to GP within 
some range. Within a variegated population of PBDs, 
let PBD X be the PBD with maximum affinity for the 
5 target. If there is a linear effect on elution volume 
of number of PBDs/GP, then the GPs having the greatest 
number of PBD X will be most retarded on the column. 
When we culture the enriched population the GP(PBD X ) 

• will be amplified and give rise to new GP(PBD x )s having 
10 varying numbers of PBD X /GP. Thus the affinity 
separation process of the present invention could 
tolerate a linear effect of number of PBDs/GP on the 
elution volume of the GP(PBD) unless strong binding to 
target fortuitously causes the PBD to be displayed on 
15 the GP only in low number. 

Since there is no linear effect on elution volume 
from the number of IPBDs/GP, need for highly accurate 
regulation of IPBD/GP is not anticipated. Reproducible 
20 gene expression is more easily controlled using 
regulated rather than constitutive genetic elements. 
The analysis above assumes that GP(IPBD)s are in 
equilibrium between solution in buffer and bound to the 
affinity matrix. Rate of elution may be an important 
25 parameter in column affinity chromatography. In batch 
elution from an affinity matrix or elution from an 
affinity plate, the time that each buffer is in contact 
with the affinity material may be an important 
variable. The density of affinity molecules on the 
30 matrix is an important variable in optimizing the 
affinity separation. Because the analysis above is 
qualitative, in Sec. 10 of the preferred embodiment we 
experimentally optimize: l) the density of IPBD on the 
GP surface, 2) the density of affinity molecules on the 
affinity matrix, 3) the initial ionic strength, 4) the 
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elution rate, and 5) the quantity of GP/ (volume of 
matrix) to be loaded on the column. 


10 


15 


Transcriptional regulation of gene expression is 
best understood and most effective, so we focus our 
attention on the promoter. A number of promoters are 
known that can be controlled by specific chemicals 
added to the culture medium. For example, the lacuvs 
promoter is induced if isopropylthiogalactoside is 
added to the culture medium, for example, at between 
1.0 uM and 10.0 mM. Hereinafter, we use "XINDUCE" as a 
generic term for a chemical that induces expression of 
a gene. if transcription of the osn-inbri gene is 
controlled by XINDUCE, then the number of OSP-IPBDs per 
GP increases for increasing concentrations of XINDUCE 
until a fall-off in the number of viable packages is 
observed or until sufficient IPBD is observed on the 
surface of harvested GP(IPBD)s. 


20 


30 


The attributes that affect the maximum number of 

os ^ >-IPBDs P er GP are primarily structural in nature. 

^■kere may be steric hindrance or other unwanted 

interactions between IPBDs if OSP-IPBD is substituted 

^ or ©very wild— type OSP. Excessive levels of OSP-IPBD 

may also adversely affect the solubility or 

morphogenesis of the GP. For cellular and viral GPs, 

as few as five copies of a protein having affinity for 

another immobilized molecule have resulted in 

successful affinity separations (FERE82a, FERE82b, and 
SMIT85) . 


Another consideration of promoter regulation is 
that it is useful later to know the range of regulation 
of the osp-ipbd . (Sec. 8) In particular, one should 
determine how nearly the absence of XINDUCE leads to 
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the absence of IPBD on the GP surface; a non-leaky 
promoter is preferred. Non-leakiness is useful: a) to 
show that affinity of GP ( osp-inbd ^ s for AfM(IPBD) is 
due to the osp-ipbd gene, and b) to allow growth of 
5 GPC osp-pbd ) in the absence of XXNDUCE if the expression 
of osp-pbd is disadvantageous. The lacPVS promoter in 
conjunction with the LacI^ repressor is a preferred 
example . 

10 Sec. 4.2: DNA sequence design: 

The present invention is not limited to a single 
method of gene design. The following procedure is an 
example of one method of gene design that fills the 
15 needs of the present invention. 

Xf the amino-acid sequence of OSP-IPBD is a 
definite sequence, then the entire gene will be 
constructed (Sec. 6.1). Xf random ONA is to be fused 
20 to ipbd . then a "display probe" is constructed first; 
the random DNA is then inserted to complete the 
population of putative osp-ipbd genes (Sec. 6.2) from 
which a functional osp - ipbd gene is identified by in 
vivo selection or kindred techniques. 

25 

One may use any genetic engineering method to 
produce the correct gene fusion, so long as one can 
easily and accurately direct mutations to specific 
sites in the pbd DNA subsequence (Sec. 14.1). For the 
30 methods of mutagenesis considered here, however, the 
DNA sequence for the osp-ipbd gene must be different 
from any other DNA in the OCV. The degree and nature 
of difference needed is determined by the method of 
mutagenesis. One replaces subsequences coding for the 
35 PBD with vgDNA, then subsequences to be mutagenized 
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must:, be bounded by restriction sites that are unique 
within the OCV. If single— stranded— oligonucleotide- 
directed mutagenesis is to be used, then the DNA 
sequence of the subsequence coding for the IPBD must be 
5 unique within the OCV. 

Regulatory elements include: a) promoters, b) 

Shine-Dalgarno sequences, and c) transcriptional 
terminators, and may be isolated from nature or 
10 designed from knowledge of consensus sequences of 
natural regulatory regions. 

The coding portions of genes to be synthesized are 

* 

designed at the protein level and then encoded in DNA. 
15 The amino acid sequences are chosen to achieve various 
goals, including: . a) display of a IPBD on the surface 
of a GP, b) change of charge on a IPBD, and c) 
generation of a population of PBDs from which to select 
an SBD. The ambiguity in the genetic code is exploited 
20 to allow optimal placement of restriction sites and to 
create various distributions of amino acids at 
variegated codons. 

Sec. 4.3: Specific DNA sequence assignment: 

25 

A computer program may be used to identify all 
possible ambiguous DNA sequences coding for an amino— 
acid sequence given by the user and to identify places 

4 

where recognition sites for site-specific restriction 
30 enzymes could be provided without altering the amino- 
acid sequence. 

Restriction sites are positioned within the osp- 
ipbd gene so that the longest segment between sites is 
35 as short as possible. Enzymes the produce cohesive 
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ends are preferred. The codon preferences of the 
intended host and the secondary structure of the 
messenger RNA are also considered. 

5 Sec. 5.1: Organization of crene synthesis: 

An established strategy for gene synthesis is to 
synthesize both strands of the entire gene in 
overlapping segments of 20 to 50 nucleotides (nts) 
10 (THER88) . We prefer an alternative method that is more 
suitable for synthesis of vgDNA. Our method differs 
from previous methods (0LIP86, OLIP87 , AUSU87) in that 
we: a) use two synthetic strands, and b) do not cut the 
extended DNA in the middle. Our goals are: a) to 

15 produce longer pieces of dsDNA than can be synthesized 
as ssDNA on commercial DNA synthesizers, and b) to 
produce strands complementary to single-stranded vgDNA. 
By using two synthetic strands, we remove the 
requirement for a palindromic sequence at the 3 ' end. 

20 

DNA synthesizers can produce oligo-nts of up to 
100 nts in reasonable yield, = 100. The 

parameters N w (the length of overlap needed to obtain 
efficient annealing) and N s (the number of spacer bases 
25 needed so that a restriction enzyme can cut near the 
end of blunt-ended dsDNA) are determined by DNA and 
enzyme chemistry. N w = 10 and N s = 5 are reasonable 
values . 

30 We divide the DNA sequence to be synthesized into 

two nearly equal parts, each 5-8 bases longer than half 
the total length, so that there is an overlap between 
the two parts of 10 to 16 bp (Nw) containing no 
variegated bases. The overlap preferably, is not 
35 palindromic and has high GC content. We synthesize the 
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overlap portion and the 5 ' extension of each strand. 
When these strands are annealed and completed with 
Klenow enzyme and all four NTPs, we obtain the desired 
sequence as blunt-ended dsDNA. if the DNA is to be 
5 ligated to other DNA having cohesive ends, five to ten 
(Ns) bases are added to that end. The synthetic dsDNA 
can then be cut efficiently with an appropriate 
restriction enzyme (OLIP87) . 

10 Because M DNA is not rigidly fixed at 100, the 

current limits of 190 (= 2 M DNA - N w ) nts overall and 
100 in each fragment are not rigid, but can be exceeded 
by 5 or 10 nts. Going beyond the limits of 190 and 100 
will lead to lower yields, but these may be acceptable 
15 in certain cases. 

Sec. 5.2: DNA synthesis and purification methods : 

The present invention is not limited to any 
20 particular method of DNA synthesis or construction. 

In the preferred embodiment, DNA is synthesized by 
standard means on a Milligen 7500 DNA synthesizer. The 
Milligen 7500 has seven vials from which 
25 phosphoramidites may be taken. Normally, the first 
four contain A, C, T, and G. The other three vials 
may contain unusual bases such as inosine or mixtures 
of bases, the so-called "dirty bottle". The standard 
software allows programmed mixing of two, three, or 
30 four bases in equimolar quantities. 

The present invention is not limited to any 
particular method of purifying DNA for genetic 
engineering. Agarose gel electrophoresis and 
electroelution on an IBI device (International 
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Biotechnologies , Inc. , New Haven, CT) is, preferably, 
used to purify large dsDNA fragments. For oligo-nts, 
PAGE and electroelution with an Epigene device (Epigene 
Corp. , Baltimore, MD) are an alternative to HPLC. 

5 

Sec. 6.1: Cloning of Known OSP-inbd gene into OCV; 

In the preferred method, the synthetic gene is 
constructed using plasmids that are transformed into 
10 bacterial cells by standard methods (MANI82, p250) or 
slightly modified standard methods. Alternatively, DNA 
fragments derived from nature are operably linked to 
other fragments of DNA derived from nature or to 
synthetic DNA fragments. In most cases of the 

15 preferred method, gene synthesis involves construction 
of a series of plasmids containing larger and larger 
segments of the complete gene. 

Sec. 6.2 Cloning of Random DNA (Potential qsp^ Into 
20 Display Probe; 

If random DNA and phenotypic selection or 
screening are used to obtain a GP(IPBD) , then we clone 
random DNA into one of the restriction sites that was 
25 designed into the display probe. 

The random DNA may be obtained in a variety of 
ways. Degenerate synthetic DNA is one possibility. 
Alternatively, pseudorandom DNA may be taken from 
30 nature. If, for example, an Soh I site (GCATG/C) has 
been designed into the display probe at one end of the 
inbd fragment, then we would use Nla III (CATG/) to 
partially digest DNA that contains a wide variety of 
sequences, generating a wide variety of fragments with 
35 CATG 3 ' overhangs. Preferably, the display probe has 
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different restriction sites at each end of the iobd 
gene so that random DNA can be cloned at either end. 

A plasmid carrying the display probe is digested 
5 with the appropriate restriction enzyme and the 
fragmented, random DNA is annealed and ligated by 
standard methods. The ligated plasmids are used to 
transform cells that are grown and selected for 
expression of the antibiotic-resistance gene. Plasmid- 
10 bearing GPs are then selected for the display-of-IPBD 
phenotype by the procedure given in Sec. 15 of the 
present invention using AfM(IPBD) as if it were the 
target. Sec. 15 is designed to isolate GP(PBD)s that 
bind to a target from a large population that do not 
15 bind. 

Sec. 7; Harvest of GPs : 

Cells are transformed with ligated OCVs and 

* 

20 selected for uptake of OCV after an appropriate 
incubation with an agent appropriate to the selectable 
markers on the OCV. GPs are harvested by methods 
appropriate to the GP at hand, generally, 
centrifugation to pelletize GPs and resuspension of the 
25 pellets in sterile medium (cells) or buffer (spores or 
phage) . 

Sec. 8: Verification of Display Strategy: 

30 The harvested packages are now tested for display 

of XPBD on the surface; any ions or cofactors known to 
be essential for the stability of XPBD or 'AfM(IPBD) 
must be included at appropriate levels. The tests can 
be done: a) by affinity labeling, b) enzymatically, c) 
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e) by affinity precipitation. The AfM(IPBD) in this 
step is one picked to have strong affinity 
(preferably, K& < 10" 11 M) for the IPBD molecule and 
little or no affinity for the wtGP. For example, if 
5 BPTI were the IPBD, trypsin, anhydrotrypsin, or 
antibodies to BPTI could be used as the AfM(BPTI) to 
test for the presence of BPTI. Anhydrotrypsin, a 
trypsin derivative with serine 195 converted to 
dehydroalanine, has no proteolytic activity but retains 
10 its affinity for BPTI (AKOH72 and HUBE77 ) . 

Preferably, the presence of the IPBD on the 
surface of the GP is demonstrated through the use of a 
soluble, labeled derivative of a AfM(IPBD) with high 
15 affinity for IPBD. The labeled derivative of AfM(IPBD) 
is denoted as AfM(IPBD) *. 

If random DNA has been used, then the procedures 
of Sec. 15 are used to obtain a clonal isolate that has 
20 the display-of-IPBD phenotype. Alternatively, clonal 
isolates may be screened for the display-of-IPBD 
phenotype. The tests of this step are applied to one 
or more of these clonal isolates. 

25 If no isolates that bind to the affinity molecule 

are obtained we take corrective action as disclosed in 
Sec. 9. 

If one or more of the tests indicates that the 
30 IPBD is displayed on the GP surface, we verify that the 
binding of molecules having known affinity for IPBD is 
due to the chimeric osp— ipbd gene through the use of 
standard genetic and biochemical techniques, such as: 
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If the IPBD is displayed on the outside of the GP, 

and if that display is clearly caused by the introduced 

- ° - SP— iPbd gene, we proceed to Part XX, otherwise we must 

analyze the result and adopt appropriate corrective 
5 measures . 


Sec " — Perfecting the Display System; 


10 


15 


20 


If we have attempted to fuse an ipbd fragment to a 
P ^R fragment, our options are : 

1) pick a different fusion to the same osp by 

a) using opposite end of osp . 

b) keeping more or fewer residues from osp in 
the fusion; for example, in increments of 3 
or 4 residues, 

c) trying a known or predicted domain 
boundary , 

d) trying a predicted loop or turn position, 

2) pick a different osp r or 


3 ) switch to random DNA method . 

If we have just tried the random DNA method 
unsuccessfully t our options are : 


1) choose a different relationship between jpbd 
fragment and random DNA ( ipbd first, random DNA 
second or vice versa) , 

2) try a different degree of partial digestion, a 
different enzyme for partial digestion, a 

different degree of shearing or a different source 
of natural DNA, or 



WO 90/02809 


PCT/US89/03731 


68 

3) switch to the natural OSP method. 

If all reasonable OSPs of the current GP have been 
5 tried and the random DNA method has been tried , both 
without success, we pick a new GP. 

Part II 

10 Sec. 10. Oi Affinity Separation Means: 

9 

In Part II we optimize an affinity separation 
system that will be used in Part III to enrich a 
population of GP(vgPBD)s for those GP(PBD)s that 
15 display PBDs with increased affinity for the target. 

Affinity chromatography is the preferred means, 
but FACS, electrophoresis, or other means may also be 
used. 

20 

Sec. 10.1: Optimization of Affinity Chromatography 

Separation : 

Changes in eluant concentration cause GPs to elute 
25 from the column. Elution volume, however, is more 
easily measured and specified. It is to be understood 
' that the eluant concentration is the agent causing GP 
release and that an eluant concentration can be 
calculated from an elution volume and the specified 
3 0 crradient . 


Using a specified elution regime, we compare the 
elution volumes of GP(IPBD)s with the elution volumes 
of wtGP on affinity columns supporting AfM(IPBD) . 
35 Comparisons are made at various: a) amounts of IPBD/GP, 
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b) densities of AfM(IPBD)/ (volume of matrix) (DoAMoM) , 

c) initial ionic strengths, d) elution rates, e) 
amounts of GP/ (volume of support), f) pHs, and g) 
temperatures, because these are the parameters most 

5 likely to affect the sensitivity and efficiency of the 
separation. We then pick those conditions giving the 
best separation. 

We do not optimize pH or temperature; rather we 
10 record optimal values for the other parameters for one 
or more values of pH and temperature. The conditions 
of intended use, specified by the user (Sec. 11) , may 
include a specification of pH or temperature. If pH is 
specified, then pH will not be varied in eluting the 
15 column (Sec. 15.3) . Decreasing pH may be used to 

liberate bound GPs from the matrix. If the intended 
use specifies a temperature, we will hold the affinity 
column at the specified temperature during elution, but 
we might vary the temperature during recovery. 

20 

The AFM (IPBD) is preferably one known to have 
moderate affinity for the IPBD (K^ in the range 10” 6 M 
to 10“ 8 M) . When populations of GP(vgPBD)s are 
fractionated, there will be roughly three 
25 subpopulations: a) those with no binding, b) those that 
have some binding but can be washed off with high salt 
or low pH, and c) those that bind very tightly and must 
be rescued in situ . We optimize the parameters to 
separate (a) from (b) rather than (b) from (c) . Let 
30 PBD W be a PBD having weak binding to the target and 

PBD S be a PBD having strong binding. Higher DoAMoM 

might, 'for example, favor retention of GP(PBD W ) but 
also make it very difficult to elute viable GP(PBD S ) . 
We will optimize the affinity separation to retain 

35 GP(PBD W ) rather than to allow release of GP(PBD S ) 
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because a tightly bound GPCPBD S ) can be rescued by in 
growth. If we find that DoAMoM strongly affects 
the elution volume, then in part III we may reduce the 
amount of target on the affinity column when an SBD hag 
5 been found with moderately strong affinity (K^ on the 
order of 10“ 7 M) for the target. 

In this step, we measure elution volumes of 
genetically pure GPs that elute from the affinity 
10 matrix as sharp bands that can be detected by UV 
absorption. Samples from effluent fractions are plated 
on suitable medium (cells or spores) or on sensitive 
ce ^-^- s (phage) and colonies or plagues counted. 

Several values of IPBD/GP, DoAMoM f elution rates, 
ionic strengths, and loadings should be 
examined. We anticipate that optimal values of IPBD/GP 
and DoAMoM will be correlated and therefore should be 
optimized together. The effects of initial ionic 
strength, elution rate, and amount of GP/ (matrix 
volume) are unlikely to be strongly correlated, and so 
they can be optimized independently. 

For each set of parameters to be tested, the 
column is eluted in a specified manner. For example, 
we may use a regime called Elution Regime l: a KC1 
gradient runs from lOmM to maximum allowed for the 
GP(IPBD) viability in loo fractions of 0.05 v v (void 
volume) , followed by 20 fractions of 0.05 V v at maximum 
30 allowed KC1; pH of the buffer is maintained at the 
specified value with a convenient buffer such as Tris. 
Xt is important that the conditions of this 
optimization be similar to the conditions that are used 
in Part III for selection for binding to target (Sec. 
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15.3) and recovery of GPs from the chromatographic 
system (Sec. 15.4) . 

When the osp-ipbd gene is regulated by [XINDUCE] , 

5 XPBD/GP can be controlled by varying [XINDUCE] . 

• * 

Appropriate values of [XINDUCE] depend on the identity 
of [XINDUCE] and the promoter; if, for example, XINDUCE 
is isopropyl thiogalactoside (IPTG) and the promoter is 
lacUV5 . then [IPTG] = 0, 0.1 uM, 1.0 uM, 10.0 uM, 100.0 

10 uM, and 1.0 mM are appropriate levels to test. The 

range of variation of [XINDUCE] is extended until an 
optimum is found or an acceptable level of expression 
is obtained. 

15 DoAMoM is varied from the maximum that the matrix 

material can bind to 1% or 0.1% of this level in 

appropriate steps. We anticipate that the efficiency 

of separation will be a smooth function of DoAMoM so 

« 

that it is appropriate to cover a wide range of values 
20 for DoAMoM with a coarse grid and then explore the 
neighborhood of the approximate optimum with a finer 
grid. 

Several values of initial ionic strength are 
25 tested, such as 1.0 mM, 5.0 mM, 10.0 mM and 20.0 mM. 

The elution rate is varied, by successive factors 
of 1/2, from the maximum attainable rate to 1/16 of 
this value. The fastest elution rate giving the good 
30 separation is optimal. 

The goal of the optimization is to obtain a sharp 
transition between bound and unbound GPs, triggered by 
increasing salt or decreasing pH or a combination of 
35 both. This optimization need be performed only; a) for 
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each, temperature to be used, b) for each pH to be used, 
and c) when a new GP(XPBD) is created. 

Regulatable promoters are available for all 
5 genetic packages except, possibly, bacterial spores. A 
promoter functional in bacterial spores might be 
prepared by constructing a hybrid of a sporulation 
promoter and a regulatable bacterial promoter (e.g. , 
lac) , or by saturation mutagenesis of a sporulation 
0 promoter followed by screening for regulatable promoter 
activity (cf . OLIP86, OLIP87) . when the promoter of 
the osp-ipbd gene is not regulatable, we optimize 
DoAMoM, the elution rate, and the amount of GP/volume 
of matrix. If the optimized affinity separation is not 
5 acceptable, we must develop a means to alter the amount 
of IPBD per GP. 


20 


Sec. 10.2; 
separation : 


Measuring the sensitivity of affinity 


We determine the sensitivity of the affinity 
separation (C sens jJ by measuring the minimum quantity 
GP(IPBD) that can be detected in the presence of a 
large excess of wtGP. The user chooses a number of 
separation cycles, denoted N chrom , that will be 
performed before an enrichment is abandoned; 
preferably, Nchrom is in the range 6 to 10 and N c h r0 m 
must be greater than 4 . Enrichment can be terminated 
30 by isolation of a desired GP(SBD) before N C hrom passes. 

The measurement of sensitivity is significantly 
expedited if GP(IPBD) and wtGP carry different 
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Mixtures of GP(IPBD) and wtGP are prepared in the 
ratios of liV^i^, where V-| -j^ ranges by an appropriate 
factor ( e.g. 1/10) over an appropriate range, typically 
10 11 through 10 4 . Large values of Vn m are tested 
5 first; once a positive result is obtained for one value 
of no smaller values of need be tested. 

Each mixture is applied to a column supporting, at the 
optimal DoAMoM, an AfM(IPBD) having high affinity for 
IPBD and the column is eluted by the specified elution 
10 regime. The last fraction that contains viable GPs and 
an inoculum of the column matrix material are cultured. 
If GP (IPBD) and wtGP have different selectable markers, 
then transfer onto selection plates identifies each 
colony. Otherwise, a number f e.a. 32) of GP clonal 
15 isolates are tested for presence of IPBD by the 

techniques discussed in Sec. 8. 

If IPBD is not detected on the surface of any of 
the isolated GPs, then GPs are pooled from: a) the last 
20 few ( e . cr . 3 to 5) fractions that contain viable GPs, 

and b) an inoculum taken from the column matrix. The 
pooled GPs are cultured and passed over the same column 
and enriched for GP(IPBD) in the manner described. 
This process is repeated until N c ^ roin passes have been 
25 performed, or until the IPBD has been detected on the 

GPs. If GP(IPBD) is not detected after N C h r0 m passes, 
Viim is decreased and the process is repeated. 

C S ensi e< 3 ua.ls the highest value of Vx£ m for which 
3 0 the user can recover GP(IPBD) within Nchrom passes. 

The number of chromatographic cycles (K C y C ) that were 
needed to isolate GP(IPBD) gives a rough estimate of 
C e ff? C e ff is approximately the K C y C th root of Vlim: 

35 

c eff = (approx.) exp( log e (V lim ) /K cyc ) 
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For example, if Vj j_ m were 4.0 x 10 8 and three 
separation cycles were needed to isolate GP(XPBD) , then 
C e ff = (approx.) 736. 

5 

Sec. 10.3; Measuring the efficiency of separation : 

To determine C e f f more accurately, we determine 
the ratio of GP(XPBD)/wtGP loaded onto an AfM(IPBD) 
10 column that yields approximately equal amounts of 
GP(XPBD) and wtGP after elution. 

Sec. 10.4i Other Separation Means 

15 Other separation means are optimized in a manner 

parallel to the used for affinity chromatography. 

FACS (e.g. FACStar from Beckton— Dickinson, 
Mountain View, CA) is most appropriate for bacterial 
20 cells and spores because the sensitivity of the 

machines requires approximately 1000 molecules of 
fluorescent label bound to each GP to accomplish a 
separation. To optimize FACS separation of GPs, we use 
a derivative of Afm(XPBD) that is labeled with a 
25 fluorescent molecule, denoted Afm (IPBD)*. The 

variables that must be optimized include: a) amount of 
IPBD/GP, b) concentration of Afm (IPBD)*, c) ionic 
strength, d) concentration of GPs, and e) parameters 
pertaining to operation of the FACS machine. Because 
30 Afm (IPBD)* and GPs interact in solution, the binding 
will be linear in both [Afm(IPBD)*] and [displayed 
IPBD] . Preferably, these two parameters are varied 
together. The other parameters can be optimized 
independently . 


35 
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Electrophoresis is most: appropriate to 

bacteriophage because of their small size (SERW87) . 
Electrophoresis is a preferred separation means if the 
target is so small that chemically attaching it to a 
5 column or to a fluorescent label would essentially 

change the entire target. For example, chloroacetate 
ions contain only seven atoms and would be essentially 
altered by any linkage. GPs that bind chloroacetate 
would become more negatively charged than GPs that do 
10 not bind the ion and so these classes of GPs could be 
separated. 

The parameters to optimize for electrophoresis 
include: a) IPBD/GP, b) concentration of gel material, 
15 e.g. agarose, c) concentration of Afm (XPBD) , d) ionic 

strength, e) size, shape, and cooling capacity of the 
electrophoresis apparatus, f) voltages and currents, 
and f) concentration of GPs. Preferably, IPBD/GP and 
[Afm(IPBD) ] are varied at the same time and other 
20 parameters are optimized independently. 

Part III 

Sec. 11.0: Choice of target material : 

25 

Any material may be chosen as target material, 
subject only to the following restrictions: 

If affinity chromatography is to be used, then: 

30 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 
applied to a solid support suitable for affinity 
separation. 


35 
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2) after application to a matrix, the target 
material must not react with water, 

3) after application to a matrix, the target 

5 material must not bind or degrade proteins in a 

non-specific way, and 


4) the molecules of the target material must be 
sufficiently large that attaching the material to 
10 a matrix allows enough unaltered surface area 

(generally at least 500 S 2 , excluding the atom 
that is connected to the linker) for protein 
binding. 


15 If FACS is to be used as the affinity separation 

means , then : 


20 


25 


30 


35 


1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 
conjugated to a suitable fluorescent dye or the 
target must itself be fluorescent, 

2) after any necessary fluorescent labeling, the 
target must not react with water, 

3) after any necessary fluorescent labeling, the 
target material must not bind or degrade proteins 
in a non-specific way, and 

4) the molecules of the target material must be 
sufficiently large that • attaching the material to 
a suitable dye allows enough unaltered surface 
area (generally at least 500 A 2 , excluding the 
atom that is connected to the linker) for protein 
binding. 
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If affinity electrophoresis is to be used, then: 

1) the target must either be charged or of such a 

5 nature that its binding to a protein will change 

the charge of the protein, 

2) the target material must not react with water, 

10 3) the target material must not bind or degrade 

proteins in a non-specific way, and 

4) the target must be compatible with a suitable 

gel material. 

15 

Possible target materials include, but are not 
limited to: a) soluble proteins (such as horse heart 

myoglobin, human neutrophil elastase, activated (blood 
clotting) factor X, alpha-fetoprotein, alpha 
20 interferon, melittin, Bordetella pertussis adenylate 
cyclase toxin, any retroviral pol protease or any 
retroviral gag protease) , b) lipoproteins (such as 
human low density lipoprotein) , c) glycoproteins (such 
as a monoclonal antibody) , d) lipopolysaccharides (such 
25 as O-antigen of Salmonella enteritidis ) , e) nucleic 

acids (such as tRNAs, ribosomal RNAs , messenger RNAs 
dsDNA or ssDNA, possibly with sequence specificity) ; f ) 
soluble organic molecules (such as cholesterol , 
aspartame, bilirubin, morphine, codeine, 
30 dichlorodiphenyltrichlorethane (DDT), benzo (a) pyrene, 

prostaglandin P6E2, protoporphyrin IX, or actinomycin 
D) , g) organometallic complexes (such as iron haem or 
cobolt haem) , h) organic polymers (such as cellulose or 
chitin) , i) insoluble minerals (such as asbestos, 
35 zeolites, or hydroxylapatite) , j) viral and phage coat 
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10 


15 


or phage 
or outer 
_l.coli or 


20 


proteins (such as influenza haemaggutinin 
lambda capsid), and k) bacterial membrane 
membrane proteins (such as t*™* from ^ 
flagella proteins) . 


suppiy of several milligrams of pure target 
is desired. Impure target material could be 
used, but one might obtain a protein that binds to a 
contaminant instead of to the target. 

The following information about the target 
material is highly desirable: 

1) stability as a function of temperature, pH, and 
ionic strength. 


2) stability with respect to chaotropes such 
urea or guanidinium Cl, 

3) PI, 


25 


30 


4) molecular weight, 

5) requirements for prosthetic groups or ions, 
such as haem or Ca +2 , and 

6) proteolytic activity, if any. 

In addition to this most desirable information, it 

is useful to know: i) the target's sequence, if the 

target is a macromolecule, 2 ) the 3D structure of the 

target, 3) enzymatic activity, if any, and 4) toxicity, 
if any. 2 ' 
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The user of the present invention specifies 
certain parameters of the intended use of the binding 
protein : 

5 1) the acceptable temperature range, 

2) the acceptable pH range, 

3) the acceptable concentrations of ions and 

10 neutral solutes, 

4) the maximum acceptable dissociation constant 
for the target and the SBD: 

15 

Kip = [Target] [SBD]/ [Target: SBD] 

In some cases, the user may require discrimination 
between T, the target, and N, some non -target. Let 

20 

» 

K T = [T] [SBD]/ [T: SBD] , and 
K N = [N] [SBD]/[N:SBD] , 

then K T /% = ( [T] [N:SBD] )/ ( [N] [T:SBD] ) . 

25 

The user then specifies a maximum acceptable value for 
the ratio Rp/K jj. 

If the target material is a general protease, one 
30 must consider the following points: 

1) a highly specific protease can be treated like 
any other target, 

35 2) a general protease, such as subtilisin, may 

degrade the OSPs of the GP including OSP-PBDs; 
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80 

-there are several alternative ways of dealing with, 
general proteases, including: a) a chemical 
inhibitor may be used to prevent proteolysis ( e.a. 
phenylmethylf luorosul fate (PMFS) that inhibits 
serine proteases) , b) one or more active— site 
residues may be mutated to create an inactive 
protein f e.g. a serine protease in which the 
active serine is mutated to alanine) , or c) one or 
more active— site amino— acids of the protein may be 
chemically modified to destroy the catalytic 
activity ( e. g. a serine protease in which the 
active serine is converted to anhydroserine) , 

3) SBDs selected for binding to a protease need 
not be inhibitors; SBDs that happen to inhibit 
the protease target are a fairly small subset of 
SBDs that bind to the protease target, 

4) the more we modify the target protease, the 
less like we are to obtain an SBD that inhibits 
the target protease, and 

5) if the user requires that the SBD inhibit the 

target protease, then the active site of the 

* 

target protease must not be modified any more than 
necessary; inactivation by mutation or chemical 
modification are preferred methods of inactivation 
and a protein protease inhibitor becomes a prime 
candidate for IPBD. For example, BPTI could be 
mutated , by the methods of the present invention, 
to bind to proteases other than trypsin (TANK77 
and TSCH87) . 


35 
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Sec. 12.0: Choice of GPflPBD^ : 

The user must pick a GP(IPBD) that is suitable to 
the chosen target according to the criteria of Sec . 2 . 

5 Xt is anticipated that a small collection of a 

GP(IPBD)s can be assembled such that, for any chosen 
target, at least one member of the collection will be a 
suitable starting point for engineering a protein that 
binds to the chosen target by the methods of the 
10 present invention. The user should optimize the 

affinity separation for conditions appropriate to the 
intended use by the methods described in Part IX. 

Sec. 13.0: Identification of Family of PBDs. Related 

15 to PPBD . to Be Generated 

Sec. 13.1; Choosing residues on IPBD for other PPBD^ 
to vary: 

20 We choose residues in the IPBD to vary through 

consideration of several factors, including: a) the 3D 
structure of the IPBD, b) sequences homologous to IPBD, 
and c) modeling of the IPBD and mutants of the IPBD. 
Because the number of residues that could strongly 
25 influence binding is always greater than the number 
that can be varied simultaneously, the user must pick a 
subset of those residues to vary at one time. The user 
must also pick trial levels of variegation and 
calculate the abundances of various sequences. The 
30 list of varied residues and the level of variegation at 
each varied residue are adjusted until the composite 
variegation is commensurate with C sens i and M n tv* 

A key concept is that only structured proteins 
35 exhibit specific binding, i.e. can bind to a particular 
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chemical entity to the exclusion of most others. Thus 
the residues to be varied are chosen with an eye to 
preserving the underlying IPBD structure. 
Substitutions that prevent the PBD from folding will 
5 cause GPs carrying those genes to bind indiscriminately 
so that they can easily be removed from the population. 

Burial of hydrophobic surfaces so that bulk water 
is excluded is one of the strongest forces driving the 
10 binding of proteins to other molecules. Bulk water can 
be excluded from the region between two molecules only 
if the surfaces are complementary. We must test as 
many surfaces as possible to find one that is 
complementary to the target. The selection- through- 
15 binding isolates those proteins that are more nearly 

complementary to some surface on the target. The 
effective diversity of a variegated population is 
measured by the number -of different surfaces, rather 
than the number of protein sequences. Thus we should 
20 maximize the number of surfaces generated in our 

population, rather than the number of protein 
sequences . 

Xn hypothetical example 1, we consider a 
25 hypothetical PBD, shown in Ficrure 3 binding to a 

hypothetical target. Figure 3 is a 2D schematic of 3D 
objects; by hypothesis, residues 1, 2, 4, 6, 7, 13, 14, 
15, 20, 21, 22, 27, 29, 31, 33, 34, 36, 37, 38, and 39 
of the XPBD are on the 3D surface of the IPBD, even 
30 though shown well inside the circle. Proteins do not 
have distinct, countable faces. Therefore we define an 
"interaction set" to be a set of residues such that all 
members of the set can simultaneously touch one 
molecule of the target material without any atom of the 
target coming closer than van der Waals distance to any 


35 
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main-chain atom of the IPBD. The concept of a residue 
•'touching" a molecule of the target is discussed below. 
One hypothetical interaction set, Set A, in Figure 3 
comprises residues 6, 7, 20, 21, 22, 33, and 34, 

5 represented by squares. Another hypothetical 
interaction set. Set B, comprises residues 1, 2, 4, 6, 
31, 37, and 39, represented by circles. 

If we vary one residue, number 21 for example, 
10 through all twenty amino acids, we obtain 20 protein 
sequences and 20 different surfaces for interaction set 
A. Note that residue 6 is in two interaction sets and 
variation of residue 6 through all 20 amino acids 
yields 20 versions of interaction set A and 20 versions 
15 of interaction set B. 

Now consider varying two residues, each through 
all twenty amino . acids, generating 400 protein 
sequences. If the two residues varied were, for 

20 example, number 1 and number 21, then there would be 
only 40 different surfaces because interaction set A 
does not depend on residue 1 and interaction set B does 
not depend on residue 21. If the two residues varied, 
however, were number 7 and number 21, then 400 surfaces 
25 would be generated. 

If N spatially separated residues are varied at 
one time, 20 x N surfaces are generated. Variation of 
N residues in the same interaction set yields 20 N 
30 surfaces. For example, if N = 7, variation of 

separated residues yields 140 surfaces while variation 
of interacting residues yields 20 7 =1.28 x 10® 

surfaces. Thus, to maximize the number of surfaces 
generated when N residues are varied, all residues 
should be in the same interaction set. 


35 
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Tlxe amount of surface area buried in strong 

protein-protein interactions ranges from 1000 & 2 to 

2000 & 2 (SCHU79 , pl03ff) . Individual amino acids have 

5 total surface areas that depend mostly on type of amino 

acid and weakly on conformation. These areas range 

from about 180 & 2 for glycine to about 360 & 2 for 

tryptophan. From amino— acid solvent exposures of 

published protein structures, we calculate that lOOOfi 2 

10 on a protein surface comprises between 4 and 30 amino- 

acid residues. Varied amino acid sequences, as found 

in actual proteins, involve between 10 and 25 residues 

in forming 1000 fi 2 of protein surface. Schulz and 

Schirmer estimate that 100 & 2 of protein surface can 

15 exhibit as many as 1000 different specific patterns 

(SCHU79, pl05) . The number of surface patterns rises 

exponentially with the area that can be varied 

independently. One of the BPTI structures recorded in 

* 

the Brookhaven Protein Data Bank (6PTI) , for example, 
20 has a total exposed surface area of 3997 & 2 (using the 

method of Lee and Richards (LEEB71) and a solvent 

radius of 1.4 & and atomic radii as shown in Table 7) . 
If we could vary this surface freely and if 100 S 2 can 
produce 1000 patterns, we could construct io^- 2 ® 
25 different patterns by varying the surface of BPTll 

This calculation is intended only to suggest the huge 
number of possible surface patterns based on a common 
protein backbone. 

30 One protein framework cannot, however, display all 

possible patterns over any one particular 100 fi 2 of 
surface merely by replacement of the side groups of 
surface residues. The protein backbone holds the 
varied side groups in approximately constant locations 
so that the variations are not independent. We can, 
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nevertheless, generate a vast collection of different 
protein surfaces by varying those protein residues that 
face the outside of the protein. 

5 Examination of a model of BPT1 in contact with 

myoglobin shows that residues 3, 7, 8, 10, 13, 39, 41, 
and 42 can all simultaneously contact a molecule the 
size and shape of myoglobin. Residue 49 cannot touch a 
single myoglobin molecule simultaneously with any of 
10 the first set even though all are on the surface of 
BPTI. It is not the intent of the present invention, 
however, to use models to determine which part of the 
target molecule will actually be the site of binding by 
a PBD. 

15 

For cassette mutagenesis, the protein residues to 
be varied are, preferably, close enough in sequence 
that the variegated DNA (vgDNA) encoding all of them 
can be made in one piece. The present invention is not 
20 limited to a particular length of vgDNA that can be 
synthesized. With current technology, a stretch of 60 
amino acids (180 DNA bases) can be spanned. 

One can use other mutational means, such as 
25 single— stranded-oligonucleotide-directed mutagenesis 
(B0TS85) using two or more mutating primers to mutate 
widely separated residues. 

Alternatively, to vary residues separated by more 
30 than sixty residues, two cassettes may be mutated. A 
first cassette is mutagenized to produce a population 
having, for example, up to 30,000 members. Using 
variegated OCV, we mutagenize a second cassette to 
produce a second variegated population having the 
35 desired diversity. 
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The composite level of variation must not exceed 
the prevailing capabilities to a) produce very large 
numbers of independently transformed cells or b) detect 
5 small components in a highly varied population. The 
limits on the level of variegation are discussed in 
Sec. 13.2. 

We assemble the data about the XPBD and the target 
10 that are useful in deciding which residues to vary 1) 
3D structure, or at least a list of residues on the 
surface of the XPBD, 2) list of sequences homologous to 
IPBD, and 3) model of the target molecule or a stand-in 
for the target. 

v 

15 

These data and an understanding of the behavior of 
different amino acids in proteins will be used to 
answer two questions: 

20 1) which residues of the XPBD are on the outside 

and close enough together in space to touch the 
target simultaneously? 

2) which residues of the IPBD can be varied with 
25 high probability of retaining the underlying IPBD 

structure? 

Although an atomic model of the target material 
from X— ray crystallography, NMR, etc. is preferred in 
3 0 such examination, it is not necessary. For example, if 
the target were a protein of unknown 3D structure, it 
would be sufficient to know the molecular weight of the 
protein and whether it were a soluble globular protein, 
a fibrous protein, or a membrane protein. One can then 
35 choose a protein of known structure of the same class 
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and similar size and shape to use as a molecular stand- 
in and yardstick. At low resolution, all proteins of a 
given size and class look much the same. The specific 
volumes are the same, all are more or less spherical 
5 and therefore all proteins of the same size and class 
have about the same radius of curvature. The radii of 
curvature of the two molecules determine how much of 
the two molecules can come into contact. 

10 The most appropriate method of picking the 

residues of the protein chain at which the amino acids 
should be varied is by viewing, with interactive 
computer graphics, a model of the IPBD. A stick-figure 
representation of molecules is preferred. A suitable 
15 set of hardware is an Evans & Sutherland PS390 graphics 
terminal (Evans & Sutherland Corporation, Salt Lake 
City, UT) and a MicroVAX II supermicro computer 
(Digital Equipment Corp. , Maynard, MA) . Suitable 
programs for viewing and manipulating protein models 
20 include: a) PS-FRODO, written by T. A. Jones (JONE85) 

and distributed by the Biochemistry Department of Rice 
University, Houston, TX; and b) PROTEUS, developed by 
Dayringer, Tramantano, and Fletterick (DAYR86) . 

25 Theoretical calculations, such as dynamic 

simulations of proteins, are used to estimate the 
effect of substitution at a particular residue of a 
particular amino-acid type on the 3D structure of the 
parent protein. Such calculations might also indicate 
30 whether a particular substitution will greatly affect 
the flexibility of the protein. 


Sec. 13.1.1; The principal sett 
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Using the knowledge of which residues are on t v>«a 
surface of the IPBD, we pick residues that are close 
enough together on the surface of the IPBD to touch a 
molecule of the target simultaneously without having 
5 any IPBD main-chain atom come closer than van der Waals 
distance ( viz . 4.0 to 5.0 A) from any target atom. A 
residue of the IPBD "touches" the target if: a) a main- 
chain atom is within van der Waals distance, viz. 4.0 
to 5.0 fi of any atom of the target molecule, or b) the 
10 c beta i s within Dcutof f °f any atom of the target 
molecule so that a side-group atom could make contact 
with that atom. Because side groups differ in size 
(cfj. Table 35) , some judgment is required in picking 
D cu toff* In the preferred embodiment, we will use 
15 D cutoff ~ 8 *° but other values in the range 6.0 S. to 
10.0 8. could be used. If IPBD has G at a residue, we 
construct a pseudo with the correct bond distance 

and angles and judge the ability of the residue to 
touch the target from this pseudo . 

20 

Alternatively, we choose a set of residues on the 
surface of the IPBD such that the curvature of the 
surface defined by the residues in the set is not so 
great that it would prevent contact between all 
25 residues in the set and a molecule of the target. This 
method is appropriate if the target is a macromolecule, 
such as a protein, because the PBDs derived from the 
IPBD will contact only a part of the macromolecular 
surface . 

30 

We prefer that there be some indication that the 
underlying IPBD structure will tolerate substitutions 
at each residue in the principal set of residues. 
Indications could come from various sources, including: 
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a) homologous sequences, b) static computer modeling, 
or c) dynamic computer simulations. 

The residues in the principal set need not be 

5 contiguous in the protein sequence. We require only 

that the amino acids in the residues to be varied all 
be capable of touching a molecule of the target 

material simultaneously without having atoms overlap. 
If the target were, for example, horse heart myoglobin, 
10 and if the IPBD were BPTI, any set of residues in one 
interaction set of BPTI defined in Table 34 could be 
picked . 

Preferably, the principal set contains eight to 
15 sixteen residues. This number of residues allows 

sufficient variability that a surface that is 

complementary to the target can be found, but is small 
enough that a significant fraction of the surface can 
be varied at one time. 

20 

Sec. 13.1.2: The secondary set: 

The secondary set comprises residues that touch 
residues in the primary set, and are excluded from the 
25 primary set because the residue: a) is internal, b) is 
highly conserved, or c) is on the surface, but the 
curvature of the IPBD surface prevents the residue from 
being in contact with the target at the same time as 
one or more residues in the primary set. 

30 

Internal residues, although frequently conserved 
and may tolerate some conservative changes such as I to 
L or F to Y. These changes affect the detail placement 
and dynamics of adjacent protein residues and such 
35 variation may be useful once an SBD is found. 
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Surface residues in the secondary set are most 
often located on the periphery of the principal set, 
which do not make direct contact with the target 
5 simultaneously with all • other residues of the principal 
set. The charge on the amino acid in one of these 
residues could, however, have a strong effect on 
binding. It is appropriate to vary the charge of some 
or all of these residues to improve an SBD. For 
10 example, the variegated codon containing equimolar A 
and G at base 1, equimolar C and A at base 2, and A at 
base 3 yields amino acids T, A, K, and E with equal 
probability. 

15 Sec. 13.1.3: Choice of residues to vary initially: 

The allowed level of variegation that assures 
progressively determines how many residues can be 
varied at once; geometry determines which ones. 

20 

The user picks residues to vary in many ways; the 
following is a preferred manner. Pairs of residues are 
picked that are diametrically opposed across the face 
of the principal set. Two such pairs are used to 
25 delimit the surface, up/down and right/left. 
Alternatively, three residues that form an inscribed 
triangle, having as large an area as possible, on the 
surface are picked. One to three other residues are 
picked in a checkerboard fashion across the interaction 
30 surface. Choice of widely spaced residues to vary 
creates the possibility for high specificity because 
all the intervening residues must have acceptable 
complementarity before favorable interactions can occur 
at widely-separated residues. 


35 
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The number of residues picked is coupled to the 

* 

range through which each can be varied by the 
restrictions discussed in Sec. 13.2. In the first 
round, we do not assume any binding between IPBD and 
5 -the target and so progressivity is not an issue. At 
the first round, the user may elect to produce a level 
of variegation such that each molecule of vgDNA is 
potentially different through, for example, unlimited 
variegation of 10 codons (20 10 approx. = 10 13 ) . One 
10 run of the DNA synthesizer produces approximately io 13 
molecules of length 100 nts. Inefficiencies in 

ligation and transformation will reduce the number of 
proteins actually tested to between 10 7 and 5 x 10®. 
Multiple iterations of the process with such very high 
15 levels of variegation will not yield repeatable 
results; the user must decide whether this is 
important . 

* 

Sec. 13.2: Range of variation at Each Site of 

20 Mutation ; 

The total level of variegation is the product of 
the number of variants at each varied residue. Each 
varied residue can have a different scheme of 
25 variegation, producing 2 to 20 different possibilities. 
We require that the process be progressive, i.e. each 
variegation cycle produces a better starting point for 
the next variegation cycle than the previous cycle 
produced. 

30 

N.B. ; Setting the level of variegation such 
that the ppbd and many sequences related to 
the ppbd sequence are present in detectable 
amounts insures that the process is 
progressive. If the level of variegation is 
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so high 'that: the ppbd sequence is present at 
such low levels that there is an appreciable 
chance that no transformant will display the 
PPBD, then the best SBD of the next round 
5 could be worse than the PPBD. At excessively 

high level of variegation, each round of 
mutagenesis is independent of previous rounds 
and there is no assurance of progressivity. 

This approach can lead to valuable binding 
10 proteins, but repetition of experiments with 

this level of variegation will not yield 
progressive results. Excessive variation is 
not preferred. 

15 If the level of variegation is such that the 

parental sequence and each single amino-acid change is 
present for selection, then we know that a selected 
sequence is closer to optimal or the same as the 
parent. If, on the other hand, very high levels of 
20 variegation are used, a sequence may be selected, not 
because it is superior to the parental sequence, but 
because the parental and improved sequences are, by 
chance , absent . 

25 Progressivity is not an all-or-nothing property. 

So long as most of the information obtained from 
previous variegation cycles is retained and many 
different surfaces that are related to the PPBD surface 
are produced, the process is progressive. If the level 
30 of variegation is so high that the ppbd gene may not be 
detected, the assurance of progressivity diminishes. 
If the probability of recovering PPBD is negligible, 
then the probability of progressive behavior is also 
negligible. 


35 
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An opposing force in our design considerations is 
that PBDs are useful in the population only up to the 
amount that can be detected; any excess above the 
detectable amount is wasted. Thus we produce as many 
5 surfaces related to PFBD as possible within the 
constraint that the PPBD be detectable. 

We defer specification of exactly how much 
variegation is allowed until we have: a) specified real 
10 nt distributions for a variegated codon, and b) 

examined the effects of discrepancies between specified 
nt distributions and actual nt distributions. 

Sec. 13.3: Design of vcrDNA Encoding PBD Family : 

15 

We must now decide how to distribute the 
variegation _ within the codons for the residues to be 
varied. These decisions are influenced by the nature 
of the genetic code. When vgDNA is synthesized, 
20 variation at the first base of a codon creates a 

population containing amino acids from the same column 
of the genetic code table (as shown in the Table 3-6 on 
p87 of WATS87) ; variation at the second base of the 
codon creates a population containing amino acids from 
25 the same row of the genetic code table; variation at 

the third base of the codon creates a population 

containing amino acids from the same box. If two or 
three bases in the same codon are varied, the pattern 
is more complicated. Work with 3D protein structural 


models may 

suggest definite 

sets 

of 

amino 

acids 

to 

substitute 

at 

a given residue. 

but 

the 

method 

of 

variation may 

require either 

more 

or 

fewer 

kinds 

of 


amino acids be included. For example, examination of a 
model might suggest substitution of N or Q at a given 
residue. Combinatorial variation of codons requires 
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that mixing N and Q at one location also include K and 
H as possibilities at the same residue. One must 
choose to put; l) N only, 2) Q only, or 3) a mixture of 
N, K, H, and Q. The present invention does not rely on 
5 accurate predictions of which amino acids should be 
placed at each residue, rather attention is focused on 
which residues should be varied. 

There are many ways to generate diversity in a 
10 protein. (See RICH86, CARU85 , and OLXP86.) One extreme 
case is that one or a few residues of the protein are 
varied as much as possible ( inter alia see CARU85, 
CARU87, RICH86 , and WHAR86) . We will call this limit 
"Focused Mutagenesis" . Focused Mutagenesis is 
15 appropriate when the XPBD or other PPBD shows little or 
no binding to the target, as at the beginning of the 
search for a protein to bind to a new target material. 
When there is no binding between the PPBD and the 
target, we preferably pick a set of five to seven 
20 residues and vary each through all 20 possibilities. 

An alternative plan of mutagenesis ("Diffuse 
Mutagenesis") is to vary many more residues through a 
more limited set of choices (See Vershon et al. . Chl5 
25 of IN0U86 and PAKCJ86) . This can be accomplished by 
spiking each of the pure nts activated for DNA 
synthesis ( e.g. nt-phosphoramidites) with a small 
amount of one or more of the other activated nts. 
Contrary to general practice, the present invention 
30 sets the level of spiking so that only a small 
percentage ( 1% to .00001%, for example ) of the final 
product contains the initial DNA sequence. Many 
single, double, triple, and higher mutations occur, but 
recovery of the basic sequence is a possible outcome. 
Let Njj be the number of bases to be varied, and let Q 
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be the fraction of all sequences that should have the 
parental sequence, then M, the fraction of the mixture 
that is the majority component, is 

5 M = exp{ log e (Q)/N b } = 10 ( lo giO CQ)/ N b> . 

If, for example, thirty base pairs on the DNA 
chain were to be varied and 1% of the product is to 
have the parental sequence, then each mixed nt 
10 substrate should contain 86% of the parental nt and 14% 
of other nts . Table 8 shows the fraction ( f n) of DNA 
molecules having n non-parental bases when 30 bases are 
synthesized with reagents that contain fraction M of 
the majority component. When M=. 63096, f24 and higher 
15 are less than 10“ 8 . The entry "most" in Table 8 is the 
number of changes that has the highest probability. 
Note that substantial probability for multiple 
substitutions only occurs if the fraction of parental 
sequence (fo) is allowed to drop to around lo"" 8 . 

2 0 Mutagenesis of this sort can be applied to any part of 
the protein at any time, but is most appropriate when 
some binding to the target has been established. The 
Nj-, base pairs of the DNA chain that are synthesized 
with mixed reagents need not be contiguous. They are 
25 picked so that between Njj/3 and N b codons are affected 
to various degrees. The residues picked for mutation 
are picked with reference to the 3D structure of the 
IPBD, if known. For example, one might pick all or 
most of the residues in the principal and secondary 
30 set. We may impose restrictions on the extent of 
variation at each of these residues based on homologous 
sequences or other data. The mixture of non-parental 
nts need not be random, rather mixtures can be biased 
to give particular amino acid types specific 
35 probabilities of appearance at each codon. For 
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example, one residue may contain a hydrophobic amino 
acid in all known homologous sequences; in such a case, 
the first and third base of that codon would be varied, 
but the second would be set to T. This diffuse 

5 structure-directed mutagenesis will reveal the subtle 
changes possible in protein backbone associated with 
conservative interior changes, such as V to X, as well 
as some not so subtle changes that require concomitant 
changes at two or more residues of the protein. 

10 

For Focused Mutagenesis, we now consider the 
distribution of nts that will be inserted at each 
variegated codon. Each codon could be programmed 
differently. If we have no information indicating that 
15 a particular amino acid or class of amino acid is 
appropriate, we strive to substitute all amino acids 
with equal probability because representation of one 
obd above the detectable level is wasteful . Equal 
amounts of all four nts at each position in a codon 
20 yields the amino acid distribution in which each amino 
acid is present in proportion to the number of codons 
that code for it. This distribution has the 
disadvantage of giving two basic residues for every 
acidic residue. In addition, six times as much R, S, 
25 and L as W or M occur. If five codons are synthesized 
with this distribution, sequences encoding five Rs are 
7776-times more abundant than sequences encoding five 
Ws. To have W-W-W-W— W present at detectable levels, we 
must have R-R-R-R-R present in 7776-fold excess. 

30 

Let Abun(x) be the abundance of DNA sequences 
coding for amino acid x, defined by the distribution of 
nts at each base of the codon. For any distribution, 
there will be a most— favored amino acid (mfaa) with 
abundance Abun(mfaa) and a least- favored amino acid 
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(lfaa) with abundance Abun(lfaa). We seek the nt 

distribution that allows all twenty amino acids and 

that yields the largest ratio Abun(lfaa)/Abun(mfaa) 

subject to two constraints: equal abundances of acidic 

5 and basic amino acids and the least possible number of 

stop codons. Thus only nt distributions that yield 

Abun(E)+Abun(D) = Abun(R)+Abun(K) are considered, and 
the function maximized is: 


10 


{ (l-Abun(stop) ) (Abun(lfaa)/Abun(mfaa) ) }. 


15 


20 


We have simplified the search for an optimal nt 
distribution by limiting the third base to T or G (C or 
G is equivalent) . All amino acids are possible and the 
number of accessible stop codons is reduced because TGA 
and TAA codons are eliminated. The amino acids f, Y, 
C, H, N, I, and D require T at the third base while W,' 

M, Q, K, and E require G. Thus we use an equimolar 
mixture of t and G at the third base. 


25 


A computer program, written as part of the present 
invention and named "Find Optimum vgCodon" (See Table 
9), varies the composition at bases 1 and 2, in steps 
of 0.05, and reports the composition that gives the 
largest value of the quantity { (Abun(lfaa)/Abun (mfaa) 

(l-Abun(stop))) ). a vg codon is symbolically defined 
by the nt distribution at each base: 


30 


base #i 
base #2 
base #3 



tl + cl + al + gi - i.o 

t2 + c2 + a 2 + g 2 = l.o 
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t3 = g3 = 0.5, c3 = a3 = 0. 


The variation of the quantities tl, cl, al, gl, t2, c2, 
5 a2 , and g2 is sub j ect to the constraint that 
Abun(E) +Abun(D) equals Abun (K) +Abun (R) ; 

Abun(E) +Abun(D) = gl*a2 

Abun (K) +Abun (R) = al*a2/2 + cl*g2 + al*g2/2 

10 

gl*a2 = al*a2/2 + cl*g2 + al*g2/2 
Solving for g2, we obtain 
15 g2 = (gl*a2 - 0.5*al*a2)/(cl + 0.5*al) 

In addition, 

tl = 1 - al - cl - gl 
20 t2 = 1 - a2 - c2 - g2 

We vary al, cl, gl, a2, and c2 and then calculate tl, 
g2 , and t2 . Initially , variation is in steps of 5% . 
Once an approximately optimum distribution of nts is 
25 determined, the region is further explored with steps 
of 1%. The logic of this program is shown in Table 9. 
The optimum distribution is: 


30 


Optimum vcrCodon 



T 

c 

A 

6 

base #1 = 

0.26 

0.18 

0.26 

0.30 

base #2 = 

0.22 

0.16 

0.40 

0.22 

base #3 = 

0.5 

0.0 

0.0 

0.5 


3 
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and yields DNA molecules encoding each, hype amino acid 
with the abundances shown in Table 10. 

The computer that controls a DNA synthesizer, such 
5 as the Milligen 7500, can be programmed to synthesize 
any base of an oligo-nt with any distribution of nts by 
taking some nt substrates r e. a. nt phosphoramidites ) 
from each of two or more reservoirs. Alternatively, nt 
substrates can be mixed in any ratios and placed in one 
10 of the extra reservoir for so called "dirty bottle" 
synthesis . 

The actual nt distribution obtained will differ 
from the specified nt distribution due to several 
15 causes, including: a) differential inherent reactivity 

of nt substrates, and b) differential deterioration of 
reagents. It is possible to compensate partially for 
these effects, but some residual error will occur. We 
denote the average discrepancy between specified and 
20 observed nt fraction as S err , 

s err = square root ( average [ (f obs " f spec) / f spec 3 ) 

were f obs is the amount of one type of nt found at a 
25 base and f S p e c is the amount of that type of nt that 
was specified at the same base. The average is over 
all specified types of nts and over a number ( e , q . 10 
or 20) different variegated bases. By hypothesis, the 
actual nt distribution at a variegated base will be 
30 within 5% of the specified distribution. Actual DNA 
synthesizers and DNA synthetic chemistry may have 
different error levels. It is the user's 

responsibility to determine S err for the DNA 
synthesizer and chemistry employed. 


35 
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10 


15 


20 


To determine the possible effects of errors in nt 
composition on the amino-acid distribution, we modified 
the program "Find Optimum vgCodon” in four ways: 


1) the fraction of each nt in the first two bases 
is allowed to vary* from its optimum value times (l 
s err) to the optimum value times (l + S err ) in 
seven- equal steps (s err is the hypothetical 
fractional error level entered by the user) ; the 
sum of nt fractions at one base always equals l . o , 


2) g2 is varied in the same manner as a2, L_ 


e. we 


dropped the restriction that Abun(D)+Abun(E) = 
Abun ( K) +Abun (R) , 

3) t3 and g3 are varied from 0.5 times Cl - S err ) 
to 0.5 times (1 + S err ) in three equal steps. 


4) the s mallest ratio Abun (lfaa) /Abun (mfaa) 
sought . 


is 


25 


Xn actual experiments, we will direct the synthesizer 
to produce the optimum DNA distribution "Optimum 
vgCodon" given above. Incomplete control over DNA 
chemistry may, however, cause us to actually obtain the 
following distribution that is the worst that can be 
obtained if all nt fractions are within 5% of the 
amounts specified in "Optimum vgCodon". a 
corresponding table can be calculated for any given 

S err using the program "Find worst vgCodon within Serr 
of given distribution." given in Table ll. 


Optimum VgCodon, worst 5% e-rr-nr-g 


30 
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base 

#1 = 

0.251 

0.189 

0.273 

0.287 

base 

#2 = 

0.209 

0.160 

0.400 

0.231 

base 

#3 = 

0.475 

0.0 

0.0 

0.525 


5 This distribution yields ONA encoding different 

amino acids at the abundances shown in Table 12. 

If five codons are synthesized with reagents mixed 
so as to produce the nt-distribution “Optimum vgCodon" , 
10 and if we actually obtained the nt-distribution 

"Optimum vgCodon, worst 5% errors", then DNA sequences 
encoding the mfaa at all of the five codons are about 
277 times as likely as DNA sequences encoding the lfaa 
at all of the five codons; about 24% of the DNA 
15 sequences will have a stop codon in one or more of the 
five codons. 

When five codons are synthesized using equimolar 
mixtures at bases 1 and 2, (Abun (mfaa) /Abun( lfaa) ) 5 = 
20 7776. If we program the optimum nt distribution and 

come within 5%, then (Abun (mfaa) /Abun ( lfaa) ) 5 = 277. 

The total number of different PBDs is unchanged, but 
the least-favored sequence is about 28 times more 
abundant. Detecting the least-favored amino-acid 
25 sequence when varying four residues with equimolar nts 
at each varied base requires as sensitive a separation 
system as does detecting the least-favored amino— acid 
sequence when varying five residues with the optimized 
nt distribution. 

30 

By hypothesis, the distribution "Optimal vgCodon" 
is used in the second version of the second variegation 
of hypothetical example 2 . The abundance of the DNA 
encoding each type of amino acid is, however, taken 
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from the Table 12 . The abundance of DNA encoding the 
parental amino acid sequence is: 


Amount (parental seq. ) 

F24 G30 

= Abun(F) * Abun(G) 
= .0249 X .0663 

= 2.4 X 10" 7 


D34 E42 T47 

* Abun(D) * Abun(E) * Abun(T) 
X .0545 X .0602 X .0437 


10 


Therefore, DNA encoding the PPBD sequence as well as 
very many related sequences will be present in 
sufficient quantity to be detected and we are assured 
that the process will be progressive. 


15 


A level of variegation that allows recovery of the 
PPBD has two properties: 


20 


1) we cannot 
available , 


because the PPBD 


2) an enormous number of multiple changes related 
to the PPBD are available for selection and we 
able to detect and benefit from these changes. 


25 


The user must adjust the list of residues 
varied and levels of variegation at each residue 
the calculated variegation is within the bounds 
M ntv snd Csensi* 


to be 
until 
set by 


30 


35 


we also consider the interactions 
between the sites of variegation and the surrounding 
DNA. If the method of mutagenesis to be used is 
replacement of a cassette, we consider whether the 
variegation will generate gratuitous restriction sites 
and whether they seriously interfere with the intended 
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introduction of diversity. We reduce or eliminate 
gratuitous restriction sites by appropriate choice of 
variegation pattern and silent alteration of codons 
neighboring the sites of variegation. See the Detailed 
5 Example . 

Sec. 14.1: Insertion of synthetic vaDNA into a 

Plasmids t 

10 For cassette mutagenesis, restriction sites were 

designed and synthesized, and are used to introduce the 
synthetic vgDNA into the OCV. Restriction digestions 
and ligations are performed by standard methods 
(AUSU87) . In the case of single— stranded— 

15 oligonucleotide-directed mutagenesis, synthetic vgDNA 

is used to create diversity in the vector (BOTS85) . 

Sec. 14.2: Transformation of cells: 

20 The present invention is not limited to any one 

method of transforming cells with DNA. Standard 
methods, such as thos described in MANI82, may be 

optimized for the particular host cells and OCV. The 
goal is to produce a large number of independent 
25 transformants, preferably 10 7 of more. It is not 

necessary to isolate transformed cells between 

transformation and affinity separation. We prefer to 
have transformed cells at high concentration so that 

they can be plated densely on relatively few plates. 

30 

Sec. 14.3: Growth of the GPfvaPBD) population: 

The transformed cells are grown first under non- 
selective conditions that allow expression of plasmid 
genes and then selected to kill untransformed cells. 
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Transformed, cells are -then Induced to express the osp— 
pbd gene at the appropriate level of induction, as 
determined in Sec. 10.1. The GPs carrying the IPBD are 
harvested by a method appropriate to the package. 

5 

A high level of diversity can be generated by in 
vitro variegated synthesis of DNA and this diversity 
can be maintained passively through several generations 
in an organism without positive selective pressure. 
10 Loss or reduction in frequency of deleterious mutations 
is advantageous for the purposes of the present 
invention. It is preferable that the selection is must 
be performed before more than a few generations elapse. 
Moreover, subdividing the variegated population before 
15 amplification in an organism by removing a small sample 
(less than 10%) for further work would result in loss 
of diversity; therefore, one should use all or most of 
the synthetic DNA and most or all of the transformed 
cells. 

20 

Sec. 15.: Isolation of GPfPBD^s with bindincr-to- 

taroet phenotypes : 

The harvested packages are enriched for the 
25 binding-to -target phenotype by use of affinity 
separation involving target material immobilized on a 
matrix. Packages that fail to bind to target material 
are washed away. If the packages are bacteriophage or 
endospores, it may be desirable to include a 
30 bacteriocidal agent, such as azide, in the buffer to 
prevent bacterial growth. 


Sec. 15,1: Attaching the target material to a column; 
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Affinity column chromatography is the preferred 
method of affinity separation, but other affinity 
separation methods may be used. A variety of 

commercially available support materials for affinity 
5 chromatography are used. These include derivatized 
beads to which the target material is covalently 
linked, or non-derivatized material to which the target 
material adheres irreversibly. 

10 Suppliers of support material for affinity 

chromatography include: Applied Protein Technologies 

Cambridge, MA; Bio-Rad Laboratories, Rockville Center, 
NY? Pierce Chemical Company, Rockford, IL. Target 
materials are attached to the matrix in accord with the 
15 directions of the manufacturer of each matrix 

preparation with consideration of good presentation of 
the target. 

Sec. 15.2: Reducing selection due to non-specific 

20 binding: 

We reduce non-specific binding of GP(PBD)s to the 
matrix that bears the target in two ways: 

25 1) we treat the column with blocking agents such 

as genetically defective GPs or a solution of 
protein before the population of GP(vgPBD)s is 
chromatographed , and 

2) we pass the population of GP(vgPBD)s over a 
matrix containing no target or a different target 
from the same class as the actual target prior to 
af f inity chromatography . 


30 
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Step (1) above saturates any non-specific binding that 
the affinity matrix might show toward wild-type GPs or 
proteins in general; step (2) removes components of our 
population that exhibit non-specific binding to the 
5 matrix or to molecules of the same class as the target. 
If the target were horse heart myoglobin, for example, 
a column supporting bovine serum albumin could be used 
to trap GPs exhibiting PBDs with strong non-specific 
binding to proteins. If cholesterol were the target, 
10 then a hydrophobic compound, such as p- 
tertiarybutylbenzyl alcohol, could be used to remove 
GPs displaying PBDs having strong non-specific binding 
to hydrophobic compounds. It is anticipated that PBDs 
that fail to fold or that are prematurely terminated 
15 will be non-specifically sticky . The capacity of the 
initial column that removes indiscriminately adhesive 
PBDs should be greater f e.q. 5 fold greater) than the 
column that supports the target molecule. 

20 Variation in the support material (polystyrene, 

glass, agarose, etc. ) in analysis of clones carrying 
SBDs is used to eliminate enrichment for packages that 
bind to the support material rather than the target. 

25 Sec. 15.3: Eluting the column: 

The population of GPs is applied to an affinity 
matrix under conditions compatible with the intended 
use of the binding protein and the population is 
30 fractionated by passage of a gradient of some solute 
over the column. The process enriches for PBDs having 
affinity for the target and for which the affinity for 
the target is least affected by the eluants used. 
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Ions or cofactors needed for stability of PBDs 
(derived from IPBD) or target must be included in 
buffers at appropriate levels. We first remove 
GP(PBD)s that do not bind the target by washing the 
5 matrix with the volume of the initial buffer required 
to bring the optical density (at 260 nm or 280 nm) back 
to base line plus one to five void volumes (Vy) # The 
column is then eluted with a gradient of increasing: a) 
salt, b) [H+] (decreasing pH) , c) neutral solutes, d) 
10 temperature (increasing or decreasing) , or e) some 
combination of these factors. Salt is the most 

preferred solute for gradient formation. Other solutes 
that generally weaken non-covalent interaction may also 
be used. '•Salt” includes solutions containing any of 
15 the following ionic • species : 


Na+ 

20 NH 4 + 
Rb+ 

SO4 — 

25 

h 2 po 4 - 

Citrate 

30 


K+ 

Li+ 

Cs+ 


Ca++ 


Mg++ 

Sr++ 

* 

Ba++ 

Cl- 


Br- 


hso 4 - 



hpo 4 — 


CO 3 — 

Standard 1 
Amino Acids 


HCO 3 - 

Standard 

nucleotides 


Acetate 

Guanidinium 

Cl 


Other ionic or neutral solutes may be used. All 
solutes are subject to the necessity that they not kill 
the genetic packages. Neutral solutes, such as 
35 ethanol, acetone, ether, or urea, are frequently used 
in protein purification, however, many of these are 
very harmful to bacteria and bacteriophage above low 
concentrations. Bacterial spores, on the other hand, 
are impervious to most neutral solutes. Several passes 
40 may be made through the steps in Sec. 15. Different 
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solutes may be used in different analyses, salt in one, 
pH in the next, etc . 


Sec. 15.4; Recovery of packages; 

Recovery of packages that display binding to an 

affinity column may be achieved in several ways, 
including from: 


10 


15 


20 


25 


1) fractions eluted with a gradient as described 
above ? 

* 

2) fractions eluted with soluble target material, 

3) cells grown in situ on the matrix, 

4) cells incubated with parts of the matrix, 

5) fractions eluted after chemically or 
enzymatically degrading the linkage holding the 
target to the matrix, and 

6) regeneration of GPs after degrading the 
packages and recovering OCV DNA. 

It is possible to utilize combinations of these 
methods . It should be remembered that what we want to 
recover from the affinity matrix is not the GPs per se . 
but the information in them. Recovery of viable GPs is 
very strongly preferred, but recovery of genetic 
material is essential. 


Inadvertent inactivation of the GPs is very 
deleterious. It is preferred that maximum limits for 
solutes that do not inactivate the GPs or denature the 
target or the column are determined. One may use 
conditions that denature the column to elute GPs ; 
before the target is denatured, a portion of the 
affinity matrix should be removed for possible use as 
an inoculum. As the GPs are held together by protein- 
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protein interactions and other non-covalent molecular 
interactions, there will be cases in which the 
molecular package will bind so tightly to the target 
molecules on the affinity matrix that the GPs can not 
5 be washed off in viable form. This will only occur 
when very tight binding has been obtained. In these 
cases, methods (3) through (5) above can be used to 
obtain the bound packages or the genetic messages from 
the affinity matrix. 

10 

It is possible, by manipulation of the elution 
conditions, to isolate SBDs that bind to the target at 
one pH (pHjj) but not at another pH (pH 0 ) . The 
population is applied at pH]-, and the column is washed 
15 thoroughly at pHj^. The column is then eluted with 
buffer at pH 0 and GPs that come off at the new pH are 
collected and cultured. Similar procedures may be used 
. for other solution parameters, such as temperature. 
For example, GP(vgPBD)s could be applied to a column 
20 supporting insulin. After eluting with salt to remove 
GPs with little or no binding to insulin, we elute with 
salt and glucose to liberate GPs that display PBDs that 
bind insulin or glucose in a competitive manner. 

25 Sec. 15.5; ■Amplifying the Enriched Packages 

Viable GPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the 
case of phage, infection into a host so cultivated. If 
3 0 the GPs have been inactivated by the chromatography, 

the OCV carrying the osp - pbd gene must be recovered 
from the GP, and introduced into a new, viable host. 

Sec. 15. 6 t Determining whether further enrichment is 
35 needed : 
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The probability of isolating a GP with improved, 
binding increases by C e ff with each separation cycle. 
Let N be the number of distinct amino-acid sequences 
5 produced by the variegation. We want to perform K 
separation cycles before attempting to isolate an SBD, 
where K is such that the probability of isolating a 
single SBD is 0.10 or higher. 

10 K — the smallest integer>= log 10 (0.10 N) /log 10 ( c eff ) 

For example, if N were 1.0 x 10 7 and C e ff = 6.31 x 10 2 
then log^oC 1 * 0 x 10 6 )/log 10 (6.31 x 10 2 ) = 6.0000/2.8000 
= 2.14. Therefore we would attempt to isolate SBDs 

15 after the third separation cycle. After only two 

separation cycles, the probability of finding an SBD is 
(6.31 x 10 2 ) 2 /(1.0 x 10 7 ) = .04 and attempting to 

isolate SBDs might be profitable. 

20 Clonal isolates from the last fraction eluted in 

Sec. 15.3 containing any viable GPs, as well as clonal 
isolates obtained by culturing an inoculum taken from 
the affinity matrix, are cultured. If K separation 
cycles have been completed, samples from a number, e.q. 

25 32, of these clonal isolates are tested for elution 

properties on the {target} column. If none of the 

isolated, genetically pure GPs show improved binding to 
target, or if K cycles have not yet been completed, 
then we pool and culture, in a manner similar to the 
manner set forth in Sec. 14.3, the GPs from the last 
few fractions eluted (see Sec. 15.4) that contained 
viable GPs and from the GPs obtained by culturing an 
inoculum taken from the column matrix. We then repeat 
the enrichment procedure described in Sec. 15. This 
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cyclic enrichment may continue N chrom passes or until 
an SBD is isolated. 


If one or more of the isolated GPs has improved 
retention on the {target} column, we determine whether 
the retention of the candidate SBDs is due to affinity 
for the target material. Target material is attached 
to a different support matrix at optimal density and 
the elution volumes of candidate GP(SBD)s are measured. 
We pick the candidate that either has the highest 
elution volume or that is retained on the column after 
. If none of the candidate GP(SBD)s has higher 
elution volume than GP(PPBD of this round) , then we 
pool and culture the GPs from the last few fractions 
that contained viable GPs and the GPs obtained by 
culturing an inoculum taken from the column matrix. We 
then repeat the enrichment procedure of Sec. 15. 


if all of the SBDs show binding that is superior 
2 0 to PPBD of this round, we pool and' culture the GPs from 
the last fraction that contains viable GPs and from the 
inoculum taken from the column. This population is re- 
chromatographed at least one pass to fractionate 
further the GPs based on K d . 

25 

If an RNA phage were used as GP, the RNA would 
either be cultured with the assistance of a helper 
phage or be reverse transcribed and the DNA amplified. 
The amplified DNA could then be sequenced or subcloned 
30 into Suitable plasmids. 


§ 3 C . 15.7: Characterizing the Population; 

We characterize members of the population showing 
desired binding properties by genetic and biochemical 
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methods . We obtain clonal isolates and test these 
strains by genetic and affinity methods to determine 
genotype and phenotype with respect to binding to 
target. For several genetically pure isolates that 
5 show binding, we demonstrate that the binding is caused 
by the artificial chimeric gene by excising the osp-sbd 
gene and crossing it into the parental GP. We also 
ligate the deleted backbone of each GP from which the 
osp-sbd is removed and demonstrate that each backbone 
10 alone cannot confer binding to the target on the GP. 
We sequence the osp-sbd gene from several clonal 
isolates . 

Sec. 15.8: Testing of binding affinity; 

15 

For one or more clonal isolates , we subclone the 
sbd gene fragment, without the osp fragment, into an 
expression vector such that each SBD can be produced as 
a free protein. Each SBD protein is purified by normal 
20 means, including affinity chromatography. Physical 
measurements of the strength of binding are then made 
on each free SBD protein by one of the following 
methods: 1) alteration of the Stokes radius as a 

function of binding of the target material, measured by 
25 characteristics of elution from a molecular sizing 
column such as agarose, 2) retention of radiolabeled 
SBD on a spun affinity column to which has been affixed 
the target material, or 3) retention of radiolabeled 
target material on a spun affinity column to which has 
30 been affixed the SBD. The measurements of binding for 
each free SBD are compared to the corresponding 
measurements of binding for the PPBD. 


In each assay, we measure the extent of binding as 
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a function of concentration of each protein, and other 
relevant physical and chemical parameters. 

In addition, the SBD with highest affinity for the 
5 target from each round is compared to the best SBD of 
the previous round (IPBD for the first round) and to 
the IPBD with respect to affinity for the target 
material. Successive rounds of mutagenesis and 
selection-through-binding yield increasing affinity 
10 until desired levels are achieved. 

If binding is not yet sufficient, we must decide 
which residues to vary next (see Sec. 16.0). 

15 Sec. 15.9: Other Affinity Separation Means: 

FACs may be used to separate GPs that bind 
fluorescent labeled target with the optimized 
parameters determined in Part II. We discriminate 
20 against artifactual binding to the fluorescent lable by 
using two or more different dyes, chosen to be 
structurally different. 

Electrophoretic affinity separation uses unaltered 
25 target so that only other ions in the buffer can give 
rise to artifactual binding. Artifactual binding to 
the gel material gives rise to retardation independent 
of field direction and so is easily eliminated. A 
variegated population of GPs will have a variety of 
3 0 charges . 

First the variegated population of GPs is 
electrophoresed in a gel that contains no target 
material. The electrophoresis continues until the GPs 
35 are distributed along the length of the lane. The 
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target-free lane in which the initial electrophoresis 
is conducted is separated hy a removable baffle from a 
square of gel that contains target material. The 
baffle is removed and a second electrophoresis is 
5 conducted at right angles to the first. GPs that do • 
not bind target migrate with unaltered mobility while 
GPs that do bind target will separate from the majority 
that do not bind target. A diagonal line of non- 
binding GPs will form. This line is excised and 

10 discarded. Other parts of the gel are dissolved and 
the GPs cultured. 

Sec. 16.0; The Next Variegation Cycle: 

15 Which residues of the PBD should be varied in the 

next variegation cycle? The general rule is to 
preserve as much accumulated information as possible. 
The amino acids just varied are the ones best 
determined. The- environment of other residues has 
20 changed, so that it is appropriate to vary them again. 
Because there are always more residues in the principal 
and secondary sets than can be varied simultaneously, 
we start by picking residues that either have never 
been varied (highest priority) or that have not been 
25 varied for one or more cycles. If we find that varying 
all the residues except those varied in the previous 
cycle does not allow a high enough level of diversity, 
then residues varied in the previous cycle might be 
varied again. For example, if -j-he number of 
30 independent transformants that can be produced a nd -{-he 
sensitivity of the affinity separation were such that 
seven residues could be varied, and if the principal 
and secondary sets contained 13 residues, we would 
always vary seven residues, even though that implies 
35 varying some residue twice in a row. In such cases, we 
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would pick the residues just varied that contain the 
amino acids of highest abundance in the variegated 
codons used. 

5 It is the accumulation of information that allows 

the process to select those protein sequences that 
produce binding between the SBD and the target. Some 
interfaces between proteins and other molecules involve 
twenty or more residues. Complete variation of twenty 
10 residues would generate lo 26 different proteins. By 
dividing the residues that lie close together in space 
into overlapping groups of five to seven residues, we 
can vary a large surface but never need to test more 
than 10 7 to 10 9 candidates at once, a savings of 10 19 
15 to 10 17 fold. 

Having picked the residues to vary, we again set 
the range of variegation for each residue according to 
the principles set forth in 13.2, design the vgDNA 
20 encoding the desired mutants (Sec. 13.3), clone the 
vgDNA into GPs (Sec. 14) , and select-by-binding-to- 
target those GPs bearing SBDs (Sec. 15) . 

Sec. 17.0; OTHER CONSIDERATIONS; 

25 

Sec. 17.1; Joint selections; 

One may modify the affinity separation of the 
method described to select a molecule that binds to 
30 material A but not to material B. One needs to prepare 
two selection columns, one with material A and the 
other with material B. The population of genetic 
packages is prepared in the manner described, but 
before applying the population to A, one passes the 
35 population over the B column so as to remove those 
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members of the population that have high affinity for 
B. It may be necessary to amplify the population that 
does not bind to B before passing it over A. 
Amplification would most likely be needed if A and B 
5 were in some ways similar and the PPBD has been 
selected for having affinity for A. 

For example, to obtain an SBD that binds A but not 
B, three col umn s could be connected in series: a) a 

10 column supporting some compound, neither A nor B, or 
only the matrix material, b) a column supporting B, and 
c) a column supporting A. A population of GP(vgPBD)s 
is applied to the series of columns and the columns are 
washed with the buffer of constant ionic strength that 
15 is used in the application. The columns are uncoupled, 
and the third column is eluted with a gradient to 
isolate GP(PBD)s that bind A but not B. 

One can also generate molecules that bind to both 
20 A and B. Xn this case we use a 3D model and mutate one 
face of the molecule in question to get binding to A. 
We then mutate a different face to produce binding to 

B. 

25 The materials A and B could be proteins that 

differ at only one or a few residues. For example, A 
could be a natural protein for which the gene has been 
cloned and B could be a mutant of A that retains the 
overall 3D structure of A. SBDs selected to bind A but 
30 not B must bind to A near the residues that are mutated 
in B. If the mutations were picked to be in the active 
site of A (assuming A has an active site) , then an SBD 
that binds A but not B will bind to the active site of 
A and is likely to be an inhibitor of A. 
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To obtain a protein that will bind to both A and 
B, we can, alternatively, first obtain an SBD that 
binds A and a different SBD that binds B. We can then 
combine the genes encoding these domains so that a two- 
5 domain single-polypeptide protein is produced. The 
fusion protein will have affinity for both A and B. 

One can also generate binding proteins with 
affinity for both A and B, such that these materials 
10 compete for the same site on the binding protein. We 
guarantee competition by overlapping the sites for A 
and B. We first create a molecule that binds to target 
material A. We then vary a set of residues defined as: 
a) those residues that were varied to obtain binding to 
15 A, plus b) those residues close in 3D space to the 
residues of set (a) but that are internal and so are 
unlikely to bind directly to either A or B. Residues 
in set (b) are likely to make small changes in the 

positioning of the residues in set (a) such that the 

* 

20 affinities for A and B will be changed by small 
amounts. Members of these populations are selected for 
affinity to both A and B. 

Sec. 17.2: Selection for non-binding: 

25 

The method of the present invention can be used to 
select proteins that do not bind to selected targets. 
Consider a protein of pharmacological importance, such 
as streptokinase, that is antigenic to an undesirable 
30 extent. We can take the pharmacologically important 
protein as IPBD and antibodies against it as target. 
Residues on the surface of the pharmacologically 
important protein would be variegated and GP(PBD)s that 
do not bind to an antibody column would be collected 
and cultured. Surface residues may be identified in 
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several ways, including: a) from a 3D structure, b) 

from hydrophobicity considerations, or c) chemical 
labeling. The 3D structure of the pharmacologically 
important protein remains the preferred guide to 
5 picking residues to vary, except now we pick residues 
that are widely spaced so that we leave as little as 
possible of the original surface unaltered. 


Destroying binding frequently requires only that a 
10 single amino acid in the binding interface be changed. 
If polyclonal antibodies are used, we face the problem 
that all or most of the strong epitopes must be altered 
in a single molecule. Preferably, one would have a set 
of monoclonal antibodies, or a narrow range of antibody 
15 species. If we had a series of monoclonal antibody 
columns, we could obtain one or more mutations that 
abolish binding to each monoclonal antibody. We could 
then combine some or all of these mutations in one 
molecule to produce a pharmacologically important 
20 protein recognized by none of the monoclonal 
antibodies. Such mutants must be tested to verify that 
the pharmacologically interesting properties have not 
be altered to an unacceptable degree by the mutations. 


25 


35 


Typically , polyclonal antibodies display a range 
of binding constants for antigen. Even if we have only 
polyclonal antibodies that bind to the 
pharmacologically important protein, we may proceed as 
follows . We engineer the pharmacologically important 
protein to appear on the surface of a replicable GP. 
We introduce mutations into residues that are on the 
sur f ace of the pharmacologically important protein or 
i^fo residues thought to be on the surface of the 
Pharmacologically important protein so that a 
population of GPs is obtained. Polyclonal antibodies 
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produce?” , and ”How small a component can we find 
through selection-through-binding?" . Geneticists 

routinely find mutations with frequencies of one in 
10 10 using simple, powerful selections. The optimum 

5 level of variegation is determined by the maximum 
number of transformants and the selection sensitivity, 
so that for any reasonable sensitivity we may use a 
progressive process to obtain a series of proteins with 
higher and higher affinity for the chosen target 
10 material. Enrichments of 1000-fold by a single pass of 
elution from an affinity plate have been demonstrated 
(SMIT85) . 

Use of different variation schemes can yield 
15 different binding proteins. For any given target, a 
large plurality of proteins will bind to it. Thus, if 
one binding protein turns out to be unsuitable for some 
reason ( e.g. too antigenic) , the procedure can be 
repeated with different variation • parameters . For 
20 example, one might choose different residues to vary or 
pick a different nt distribution at variegated codons 
so that a new distribution of amino acids is tested at 
the same residues. Even if the same principal set of 
residues is used, one might obtain a different SBD if 
25 the order in which one picks subsets to be varied is 
altered. 

Sec. 17.5: Other modes of mutagenesis possible: 

30 The modes of creating diversity in the population 

of GPs discussed herein are not the only modes 
possible. Any method of mutagenesis that preserves at 
least a large fraction of the information obtained from 
one selection and then introduces other mutations in 
the same domain will work. The limiting factors are 


35 


f tk 



WO 90/02809 


PCT/US89/ 0373 1 


121 

the number of independent transformants that can be 
produced and the amount of enrichment one can achieve 
through affinity separation. Therefore the preferred 
embodiment uses a method of mutagenesis that focuses 
5 mutations into those residues that are most likely to 
affect the binding properties of the PBD and are least 
likely to destroy the underlying structure of the IPBD. 

Other modes of mutagenesis might allow other GPs 
10 to be considered. For example, the bacteriophage 

lambda is not a useful cloning vehicle for cassette 
mutagenesis because of the plethora of restriction 
sites. One can, however, use single-stranded-oligo- 
nt-directed mutagenesis on lambda without the need for 
15 unique restriction sites. No one has used single- 

stranded-oligo-nt-directed mutagenesis to introduce the 
high level of diversity called for in the present 
invention, but if it is possible, such a method would 
allow use of phage with large genomes. 
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Example 1 


BPTI-Derived Binding Protein for HHMb ; Displayed bv M13 
5 Phage 

Presented below is a hypothetical example of a 
protocol for developing a new binding molecule derived 
from BPTI with affinity for horse heart myoglobin 
10 (HHMb) using the common EL. coli bacteriophage M13 as 
genetic package. Xt will be understood that some 
further optimization, in accordance with the teachings 
herein, may be necessary to obtain the desired results. 
Possible modifications in the preferred method are 
15 discussed immediately following various steps of the 
hypothetical example . 

By hypothesis, we set the following technical 

500 ng/synthesis of ssDNA 100 bases 
long, 

10 ug/synthesis of ssDNA 60 bases long, 
1 mg/synthesis of ssDNA 20 bases long. 

100 bases 

1 mg/1 

0.1 % for blunt-blunt, 

4 % for sticky-blunt, 

11 % for sticky-sticky. 

5 x 10 8 


capabilities : 

20 

Y D q 

25 

m dna 

Y P1 

30 I*ef 

M n tv 


J. 
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C e ff 900-fold enrichment 

c sensi 1 in 4 x 10® 

N chrom i® passes 


s 


0.05 


1° Example 1. Part X 

In this example, we will use M13 as a replicable 
GP and BPTI as IPBD . In Part I , we are concerned only 
with getting BPTI displayed on the outer surface of an 
15 M13 derivative. • Variable DNA may be introduced in the 

osp-ipbd, gene, but not within the region that codes for 
the trypsin-binding region of BPTI. Once BPTI is 
displayed on the M13 outer surface of an M13 
derivative, we proceed to Part II to optimize the 
20 affinity separation procedures. 


25 


30 


35 


For this example, we choose a filamentous 
bacteriophage of E . coli . M13. We prefer phage over 
vegetative bacterial cells because phage are much less 
metabolically active. We prefer phage over spores 
because the molecular mechanisms of the virion 
formation and 3D structure of the virion are much 
better understood than are the corresponding processes 
of spore formation and structures of spores. 

M13 is a very well studied bacteriophage, widely 
used for DNA sequencing and as a genetic vector; it is 
a typical member of the class of filamentous phages. 
The relevant facts about M13 and other phages that will 
allow us to choose among phages are cited in Sec. 
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1.3.1. 


Compared to other bacteriophage, filamentous phage 
in general are attractive and M13 in particular is 
5 especially attractive because: 


10 


15 


20 


25 


1) the 3D structure of the virion is known, 

2) the processing of the coat protein is well 
understood , 

3) the genome is expandable, 

4) the genome is small, 

5) the sequence of the genome is known, 

6) the virion is physically resistant to shear, 
heat, cold, guanidinium Cl, low pH, and high salt, 

7) the phage is a sequencing vector so that 
sequencing is especially easy, and 

8) antibiotic-resistance genes have been cloned 
into the genome with predictable results (HINE80) . 


Other criteria listed in Sec. 1.0 and 1.3 of the are 
also satisfied: M13 is easily cultured and stored 
(FRIT85) , each infected cell yielding 100 to 1000 M13 
30 progeny after infection. M13 has no unusual or 

expensive media requirements and is easily harvested 
and concentrated (SALX64, YAMA.70 , FRIT85) . M13 is 

stable toward physical agents: temperature (10% of 

phage survive 30 minutes at 85°C) , shear (Waring 
blender does not kill) , desiccation (not applicable) , 
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radiation (not applicable) , age (stable for years) . 

M13 is stable toward chemicals: pH (< 2.2 

( SMXT8 5 ) ) , surface active agents: not applicable, 

5 chaotropes (guanidinium HC1 = 6.0 M) , ions (no specific 

* 

sensitivities) , organic solvents (ether and other 
organic solvents are lethal (MARV78) ) , proteases (not 
applicable, HHMb not a protease) . M13 is not known to 

be sensitive to other enzymes. 

10 

M13 genome is 6423 b.p. and the sequence is known 
(SCHA78) . Because the genome is small, cassette 

mutagenesis is practical on RF M13 (AUSU87) , as is 
single-stranded oligo-nt directed mutagenesis (FRIT85) . 

15 M13 is a plasmid and transformation system in itself, 

and an ideal sequencing vector. M13 can be grown on 
Rec“ strains of EL. coli . The M13 genome is expandable 
(MESS78, FRXT85) . M13 confers no advantage, but 

doesn't lyse cells. The sequence of gene VIXI is 

I 

20 known, and the amino acid sequence can be encoded on a 
synthetic gene, using lacUV5 promoter and used in 
conjunction with the LacX^ repressor. The lacUV5 
promoter is induced by IPTG. Gene VIII protein is 
secreted by a well studied process and is cleaved 

25 between A23 and A24. Residues 18, 21, 22, and 23 of 

gene VIII protein control cleavage. Mature gene VIII 
protein makes up the sheath around the circular ssDNA. 
The 3D structure of fl virion is known at medium 
resolution? the amino terminus of gene VIII protein is 

30 on surface of the virion. No fusions to M13 gene VIII 
protein have been reported. The 2D structure of M13 
coat protein is implicit in the 3D structure. Mature 
M13 gene VIII protein has only one domain. There are 
four minor proteins: gene III, VI, VII, and IX. Each 
of these minor proteins is present in about 5 copies 
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per virion and is related to morphogenesis or 
infection. The major coat protein is present in more 
than 2500 copies per virion. 

5 Although no fusions of M13 gene VIII to other 

genes have been reported, knowledge of the virion 3D 
structure (BANN810) makes attachment of IPBD to the 
amino terminus of mature M13 coat protein (M13 CP) 
quite attractive. Should direct fusion of BPTI to M13 

10 CP fail to cause BPTI to be displayed on the surface of 
M13, we will vary part of the BPTI sequence and/or 
insert short random DNA sequences between BPTI and M13 
CP. 

15 Smith (SMIT85) and de la Cruz et al . (CRUZ88) have 

shown that insertions into gene 111 cause novel protein 
domains to appear on the virion outer surface. If BPTI 
can not be made to appear on the virion outer surface 
by fusing the but! gene to the ml3cu gene, we will fuse 

20 boti to gene III either at the site used by Smith and 
by de la Cruz et al. or to one of the termini. We will 
use a second, synthetic copy of gene III so that some 
unaltered gene III protein will be present. 

25 The gene VIII protein is chosen as OSP because it 

is present in many copies and because its location and 
orientation in the virion are known. Note that any 
uncertainty about the azimuth of the coat protein about 
its own alpha helical axis is unimportant. 

30 

The 3D model of fl indicates strongly that fusing 
BPTI to the amino terminus of M13 CP is more likely to 
yield a functional protein than any other fusion site. 
(See Sec. 1.3.3). 
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The amino-acid sequence of M13 pre-coat (SCHA78) , 
called AA_seql, is 


5 


AA_seql 


5 


1 1 

0 5 



3 3 4 4 5 

0 5 0 5 0 


MKKSLVLKASVAVATLVPMLSFAAEGDDPAKAAFNSLQASATEYIGYAWA 


10 


5 6 6 7 7 

5 0 5 0 3 

MWVIVGATIGIKLFKKFTSKAS 


15 


20 


25 


30 


The single-letter codes for amino acids and the codes 
for ambiguous DNA are internationally recognized 
(GEOR87) . The best site for inserting a novel protein 
domain into M13 CP is after A23 because SP-I cleaves 
the precoat protein after A23, as indicated by the 
arrow. Proteins that can be secreted will appear 
connected to mature M13 CP at its amino terminus. 
Because the amino terminus of mature M13 CP is located 
on the outer surface of the virion, the introduced 
domain will be displayed on the outside of the virion. 

BPTI is chosen as IPBD of this example (See Sec. 
2.1) because it meets or exceeds all the criteria: it 
is a small, very stable protein with a well known 3D 
structure. Marks et al. (MARKS 6) have shown that a 
fusion of the phoA signal peptide gene fragment and DNA 
coding for the mature form of BPTI caused native BPTI 
to appear in the periplasm of E . coli . demonstrating 
that there is nothing in the structure of BPTI to 
prevent its being secreted. 


Marks jet al . (MARKS 7) also showed that the 

structure of BPTI is stable even to the removal of one 
of the cystine bridges. They did this by replacing 
both C14 and C38 with either two alanines or two 
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threonines. The C14/C38 cystine bridge that Marks et 
al. removed is the one very close to the scissile bond 
in BPTX f surprisingly, both mutant molecules 

functioned as trypsin inhibitors. This indicates that 
5 BPTX is redundantly stable and so is likely to fold 
into approximately the same structure despite numerous 
surface mutations. Using the knowledge of homologues, 
vide infra . we can infer which residues must not be 
varied if the basic BPTI structure is to be maintained. 

10 

The 3D structure of BPTI has been determined at 
high resolution by X-ray diffraction (HUBE77, MARQ83, 
WLOD84, WL0D87a, WLOD87b) , neutron diffraction 
(WLOD84) , and by NMR (WAGN87) . In one of the X-ray 

15 structures deposited in the Brookhaven Protein Data 
Bank, "6PTI" , there was no electron density for A58, 
indicating that A58 has no uniquely defined 
conformation. Thus we know that the carboxy group does 
not make any essential interaction in the folded 

20' structure. The amino terminus of BPTI is very near to 
the carboxy terminus. Goldenberg and Creighton 
reported on circularized BPTI and circularly permuted 
BPTI (GOLDS 3) . Some proteins homologous to BPTI have 
more or fewer residues at either terminus. 

25 

BPTI has been called "the hydrogen atom of protein 
folding" and has been the subject of numerous 
experimental and theoretical studies (STAT87, SCHW87, 
GOLD83 , CHAZ83) . 

30 

BPTI has the added advantage that at least 32 
homologous proteins are known, as shown in Table 13 . A 
tally of ionizable groups is shown in Table 14 and the 
composite of amino acid types occurring at each residue 
is shown in Table 15. 
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BPTI is freely soluble and is not known to bind 
metal ions. BPTI has no known enzymatic activity. 
BPTI binds to trypsin, = 6.0 x 10~ 14 M (TSCH87) . 

5 BPTI is not toxic. If K15 of BPTI is changed to L, 
there is no measurable binding between the mutant BPTI 
and trypsin (TSCH87) . 

All of the conserved residues are buried; of the 
10 seven fully conserved residues only G37 has noticeable 
exposure. The solvent accessibility of each residue in 
BPTI is given in Table 16 which was calculated from the 
entry "ePTI’' in the Brookhaven Protein Data Bank with a 
solvent radius of 1.4 A, the atomic radii given in 
15 Table 7, and the method of Lee and Richards (LEEB71) . 
Each of the 51 non-conserved residues can accommodate 
two or more kinds of amino acids . By independently 
substituting at each residue only those amino acids 
already observed at that residue, we could obtain 
20 approximately 7 x 10 42 different amino acid sequences, 
most of which will fold into structures very similar to 
BPTI. 

BPTI will be useful as a IPBD for macromolecules . 
25 (See Sec. 2. 1.1) • BPTI and BPTI homologues bind tightly 
and with high specificity to a number of enzymes. 

BPTI is strongly positively charged except at very 
high pH, thus BPTI is useful as IPBD for targets that 
30 are not also strongly positive under the conditions of 
intended use (see Sec. 2.1.2) . There exist homologues 
of BPTI, however, having quite different charges ( viz . 
SCI-III from Bombvx mori at -7 and the trypsin 
inhibitor from bovine colostrum at -1) . Once a 
35 derivative of M13 is found that displays BPTI on its 
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surface, the sequence of the BPTI domain can be 
replaced by one of the homologous sequences to produce 
acidic or neutral IPBDs . 

5 BPTI is not an enzyme ( See Sec . 2.1.3). BPTI is 

quite small; if this should cause a pharmacological 
problem, two or more BPTI-derived domains may be joined 
as in the human BPTI homologue that has two domains. 

10 A derivative of M13 is the preferred OCV. (See 

Sec . 3 ) . A "phagemid” is a hybrid between a phage and 
a plasmid, and is used in this invention. Double- 
stranded plasmid DNA isolated from phagemid-bearing 
cells is denoted by the standard convention, e.q. 
15 pX¥24 . Phage prepared from these cells would be 

designated XY24. Phagemids such as Bluescript K/S 
(sold by Stratagene) are not suitable for our purposes 

because Bluescript does not contain the full genome of 

♦ 

M13 and must be rescued by coinfection with helper 
2 0 phage. Such coinfections could lead to genetic 

recombination yielding heterogeneous phage unsuitable 
for the purposes of the present invention. 

The bacteriophage M13 bla 61 (ATCC 37039) is 
25 derived from wild-type Ml 3 through the insertion of the 
beta lactamase gene (HINE80) . This phage contains 8.13 
kb of DNA. M13 bla cat 1 (ATCC 37040) is derived from 
M13 bla 61 through the additional insertion of the 
chloramphenicol resistance gene (HINE80) ; M13 bla cat 1 
30 contains 9.88 kb of DNA. Although neither of these 

variants of M13 contains the ColEl origin of 

replication, either could be used as a starting point 
to construct a usable cloning vector for the present 
example . 


35 
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The OCV for the current example is constructed by 
a process illustrated in Figure 4 . A brief description 
of all the plasmids and phagemids constructed for this 
Example is found in Table 17. 

5 

For ss oligo-nt site-directed mutagenesis, 
multiple primers lead to higher efficiency. Three non- 
mutagenic primers are used: bases 2326-2352 of wt M13, 

bases 4854-4875 of wt M13 , and the complement of bases 
10 34 31-3451 of pBR322. Note that pLG2 and its 

derivatives carry the anti-sense strand of the amp* * 
gene in the + DNA strand. The segments are picked to 
be high in GC content and to divide the pLG7 genome 
into several segments of approximately equal length. 

15 

The genetic engineering procedures needed to 
construct the OCV are standard, using commercially 
available restriction enzymes under recommended 
conditions. All restriction fragments of DNA are 

20 purified by electrophoresis or HPLC. M13 and its 
engineered derivatives are infected into E_s_ coli strain 
PE384 (F + ,Rec~, Sup + , Amp^) . Plasmid DNA of M13 

derivatives is transformed into E, coli strain PE383(F“ 

, Rec” , Sup + , Amp s ) so that we avoid multiple rounds of 
25 infection in the culture. Isolation of M13 phage is by 
the procedure of Salivar et al. (SALI64) ; isolation of 
replicative form (RF) M13 is by the procedure of 
Jazwinski et al. (JAZW73a and JAZW73b) . Isolation of 
plasmids containing the ColEl origin of replication is 
30 by the method of Maniatis (MANI82) . 

We pick the amp R gene from pBR322 as a convenient 
antibiotic resistance gene. Another resistance gene, 
such as kanamycin, could be used. The Acc 1-to- Aat II 
35 fragment of pBR322 is a conveniently obtained source of 
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anv^ and the Col El origin. 

Ml3mpl8 (New England BioLabs) contains neither Aat 
IX nor Acc I sites. Therefore we insert an adaptor 
5- that allows us to insert the Aat ll-to- Acc I fragment 
of pBR322 that carries the amp R gene and the ColEl 
origin of replication into a desirable place in 
M13mpl8 . M13mpl8 contains a lacUVS promoter and a lacZ 
gene that are not useful to the purposes of the present 
10 invention. By cutting M13mpl8 with Ava il and Bsu36 I 
and discarding the approximately 600 intervening base 
pairs, we eliminate all recognition sites of several 
enzymes useful for engineering the bpti-gene VIII gene. 

15 The following adaptor is synthesized, 

5 * GACCGACGTCtgcctcGTATACCGGACCGcatagctCC 3 ' olig#l 
3 ' GCTGCAGacggagCATATGGCCTGGCgtatcgaGGACT 5 1 olig#2 
Avail } Aatll ) | AccI | Rsrll | | Bsu36I 

20 

The annealed adaptor is ligated with RF M13mpl8 
that has been cut with both Ava il and Bsu3 6 1 and 
purified by PAGE or HPLC. Transformed cells are 

25 selected for plasmid uptake with ampicillin. The 
resulting construct is called pLGl. 

DNA from pLGl is cut with both Aat II and Acc I. 
Aatl l-to- Acc I fragment of pBR322 is ligated to the 
30 backbone of LG1 . The correct construct is named pLG2. 

The Acc I restriction site is no longer needed for 
vector construction. To eliminate this site, RF pLG2 
dsDNA is cut with Acc I, treated with Klenow fragment 

35 and dATP and dTTP to make it blunt and then religated. 

The cloning vector, named pI»G3, is now ready for 

stepwise insertion of the osp-ipbd gene. 



WO 90/02809 


PCT/US89/03731 


133 


We are now ready to design a gene (See Sec. 4) 
that will cause BPTI -domains to appear on the outer 
surface of an M13 derivative: LG7. 

5 

To obtain a novel protein domain attached to the 
outside of M13 , we insert DNA that codes for mature 
BPTX after A23 of the precoat protein of M13 . Mature 
BPTI begins with an arginine residue, which is charged; 

10 cleavage by signal peptidase I is normal in such cases. 
Signal peptidase I (SP-I) cuts a chimera of M13 coat 
protein and BPTI after A23 leaving mature BPTI attached 
at its carboxy end to the amino terminus of M13 CP. 


15 The following amino-acid sequence, called AA_seq2, 

is constructed, by inserting the sequence for mature 
BPTI (shown underscored) immediately after the signal 
sequence of M13 precoat protein (indicated by the 
arrow) and before the sequence for the M13 CP. 

20 

AA_seq2 

1 1 2 | 1 2 3 3 4 4 5 

5 0 5 0 V5 0 5 0 5 0 

2 5 MKKSLVLKASVAVATLVPMLS FARPDFCLEPPYTGPCKARIIRYFYNAKA 


30 


566778899 10 

5050505050 
GLCOTFVYGGCRAKRNNFKS AE DCMRTCGG AA EGDDPAEA A FWS T-OASaT 


10 11 11 12 12 13 

5 0 5 0 5 0 

35 EYIGYAWAMVWIVGATIGIKLFKKFTSKAS 


Sequence numbers of fusion proteins refer to the 
fusion, as coded, unless otherwise noted. Thus the 
alanine that begins M13 CP is referred to as "number 
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82"/ "number 1 of M13 CP", or "number 59 of the mature 
BPTI-M13 CP fusion". 

The osp-ipbd gene is regulated by the lacUV5 
5 promoter and terminated by the trpA transcription 
terminator. The host strain of E_s. coli harbors the 
lacl Q gene. The osp-ipbd gene is expressed and 

processed in parallel with the wild- type gene VTIX . 
The novel protein, that consists of BPTX tethered to a 
10 M13 CP domain, constitutes only a fraction of the coat. 

Affinity separation is able to separate phage carrying 
only five or six copies ' of a molecule that has high 
affinity for an affinity matrix (SMIT85) ; 1% 

incorporation of the chimeric protein results in about 
15 30 copies of the protein exposed on the surface. If 

this is insufficient, additional copies may be provided 
by, for example, increasing IPTG. 

A model comprising M13 coat, after the model for 
20 fl of Marvin and colleagues (BANN81) , and a BPTI 
domain, taken from the Brookhaven Protein Data Bank 
entry "6PTI" , was constructed by standard model 
building methods that insure that covalent bond lengths 
and angles are close to acceptable values. The model 
25 shows that the fusion protein could fit into the 
supramolecular structure in a stereochemically 
acceptable fashion without disturbing the internal 
structure of either the M13 CP or BPTI domain. 

30 The ambiguous DNA sequence coding for AA_seq2, is 

examined by a computer program for places where 
recognition sites for restriction enzymes could be 
created without altering the amino-acid sequence. (See 
Sec. 4.3) . A master table of enzymes is compiled from 
the catalogues of enzyme suppliers. The enzymes that 
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do not cut the OCV . ( Preferably constructed as 

described above) . 

Using the procedure given in Sec. 4.3, we design a 
5 ipbd gene, such as that shown in Table 25. Some 
restriction enzymes ( e.q. Ban I or Hnh I) cut the OCV 
too often to be of value. 

The entire DNA sequence of the m!3cp-bpti fusion 
10 with annotation appears in Table 25 showing the useful 
restriction sites and biologically important features, 
viz . the lacUV5 promoter, the lacO operator, the Shine- 
Dalgarno sequence, the amino acid sequence, the stop 
codons, and the transcriptional terminator. 

15 

The ipbd gene is synthesized in several steps 
using the method described in Sec. 5.1, generating 
dsDNA fragments of 150 to 190 base pairs. 

20 The four steps (See Sec. 6.1) by which we clone 

synthetic fragments of the ml3cp-bpti gene (the osp- 
ipbd gene of the present example) into pLG3 and its 
derivatives are illustrated in Figure 5. 

25 The sequence to be introduced into pLG3 comprises 

a) the segment from RsrII to Avrll (Table 25) , b) a 
spacer sequence (gccgctcc) , and c) the segment from 
AsuXX to Sau l . The segment is 158 bases long and is 
synthesized from two shorter synthetic oligo— nts as 
30 described in Sec. 5.1 of the generic specification. 

Table 27 shows the antisense strand of the 
sequence to be inserted. The 99 base fragment shown in 
upper case letters and underscored (5 1 - 
35 CCGTCC. . . . CCTTCG-3 1 = olig#3) is synthesized in the 
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standard manner. Similarly, the loo base long fragment 
° e sense strand shown in lower case (5»_ 

cgctca . . . . aattg-3 ' = olig#4) is synthesized. After 
annealing, the double- stranded region is extended with 
Klenow fragment by the procedure given above to make 
the. entire 176 bases double stranded. The overlap 
region is 23 base pairs long and contains 14 CG pairs 
and 9 AT pairs. The DNA between Ayrii and Asuii does 
not code for anything in the final pbd gene; it is 
there so that the DNA can be cut by both Ayrii and 
Asuii at the same time in the next step. Eight bases 
have been added to the left of Rsrll and nine bases 
have been added to the left of £e|ul (same specificity 
and cutting pattern as Bsu36I) . These bases at the 
ends are not part of the final product; they must be 
present so that the restriction enzymes can bind and 
cut the synthetic DNA to produce specific sticky ends. 

The synthetic DNA is cut with both Sau l and RsrH 
and is ligated to similarly cut dsDNA of pLG3. The 
construct with the correct insert is called pLG4. 


The second step of the construction of the OCV is 
illustrated in Table 28. As in the construction of 
PLG4, two pieces of single-stranded DNA are 
synthesized: a 99 base long fragment of the anti-sense 

strand ending with P 25 and a 99 base long fragment 
s ar mg with pis) . Both the synthetic dsDNA and dsRF 
PLG4 DNA are cut with both Ayrii and Asuii and are 
igated and used to transform ^ coli . The construct 
carrying this second insert is called pLG 5 . 


Construction of P LG6 proceeds similarly to the 
construction of pLG5. The sequence is shown in Table 
30. The two single stranded segments (one from the 
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anti-sense strand ending with N66 and the other from 
the sense strand starting with the third base of the 
codon for V58) are synthesised, annealed, and extended 
with Klenow fragment. Both the synthetic DMA and EF 
PLG5 are cut with both flssHI and &HUII, purified, and 

the appropriate pieces are ligated and used to 

transform coli . 


10 


15 


20 


25 


30 


The construction of pLG7 is illustrated in Table 
32 and proceeds similarly to the constructions of pLG 4 

PI^5, and pLG6 . The two single stranded segments (on! 

rom the anti-sense strand ending with the first base 
of the codon for vno and the other beginning with 
01 ) are synthesised, annealed, and extended with 
Klenow fragment. Both the synthetic DNA and EF pLG6 
are cut with both £b£X and ABUI, purified, and the 

"r PiECeS arS li9ated and Used to transform E. 
SElA. The construct with the correct fourth insert Z 

called pLG7; the display of BPTI on the outer surface 
Of LG7 is verified by the methods of Sec. 8. 

M13am429 is an amber mutation of M13 used to 
reduce non-specific binding by the affinity matrix for 

P ages derived from M13. M13am429 is derived by 

Standard genetic methods (MILL72) from wtM13. 

broth “ 7 13 S,r0 ’"’ on ^ saU strain PE384 in LB 

oth with various concentrations of iptg added to the 

medium to induce the ssaripbd gene. Phage LG 7 J 

TcTaT" fr0 " CSllS grOWn “ lth 0-0, 0.1, l.o, 10.0 or 
100 0 uM, or l.o mM iptg, harvested (see Sec. 7) by the 

method of salivar (SALIS4) , and concentrated to obtain 
«Bs.” 10 PfU/,nl ^ thS Mth ° d -..sing 
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The preferred method of determining whether LG7 
displays BFTI on its surface (See Sec. 8) is to 
determine whether these phage can retain a labeled 
derivative of trypsin (trp) or anhydro trypsin (AHTrp) 
5 on a filter that allows passage of unbound trp or 
AHTrp. Trypsin contains 10 tyrosine residues and can 
be iodinated with 12 5 I by standard methods; we denote 
the labeled trypsin as "trp*" . Labeled anhydrotrypsin 
is denoted as "AHTrp*" . Other types of labels can be 
10 used on trp or AHTrp, e.q. biotin or a fluorescent 
label. AHTrp* or trp* is labeled to an activity of 0.3 
uCi/ug. A sample of 10 12 LG7(10 mM IPTG) is mixed with 
1.0 ug of trp* or AHTrp* in 1.0 ml of a buffer of 10 mM 
KC1, adjusted to pH 8.0 with 1 mM K 2 HP0 4 / KH 2 P0 4 . The 
15 mixture is passed through an Amicon MSP1 system fitted 
with a membrane filter that allows passage of proteins 
smaller that M r = 300,000. Filters are soaked in 
buffer containing trp or AHTrp prior to the analysis. 
The filter is washed twice with 0.5 ml of buffer 
20 containing trp or AHTrp. The radioactivity retained on 
the filter is quantitated with a scintillation counter 
or other suitable device. If each virion displays one 
copy of BPTI , then . 05 ug of protein can be bound that 
would give rise to 3 x 10 4 disintegrations / minute on 
25 the filter. 

An alternative way to quantitate display of BPTI 
on the surface of LG7 is to use the stoichiometric 
binding between trypsin and BPTI to titrate the BPTI. 
30 a solution that titers lO^- 2 pfu/ml of a phage is 
approximately 1.6 x 10“ 9 M in phage if each virion is 
infective. The ratio of pfu to total phage can be 
determined spectrophotometrically using the molar 
extinction coefficients at 260 nm and 280 nm corrected 
for the increased length of LG7 as compared to wtM13 . 
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For example, if a 1.0 ml solution that contains 10 -*- 2 
pfu of LG7 phage grown with 1.0 mM IPTG inhibits 
trypsin solutions up to 4.8 x 10 -7 M, we calculate that 
there are approximately 300 BPTIs/GP ( i.e. (4.8 x 10 " 7 
5 molecules of BPT1/1)/(1.6 x 10“^ phage/1)). Inhibition 
of a specified concentration of trypsin is most easily 
measured spectrophotometrically using a peptide-linked 
dye, such as N^pj^-benzoyl-Arg-Nan (TSCH87) . 

10 Alternatively, binding to an affinity column may 

be used to demonstrate the presence of BPTI on the 
surface of phage LG7 . An affinity column of 2.0 ml 
total volume having BioRad Af fi-Gel 10 (™) matrix and 
30 mg of AHTrp as affinity material is prepared by the 
15 method of BioRad. The void volume (Vy) of this column 
is, by hypothesis, l.o ml. This affinity column is 
denoted (AHTrp). 

A sample of 10 12 Ml3am429 is applied to (AHTrp) in 
20 1.0 ml of 10 mM KCl buffered to pH 8.0 with KH 2 PO 4 / 

K 2 HPO 4 . The column is then washed with the same buffer 
until the optical density at 280 nm of the effluent 
returns to base line or 4 x Vy have been passed through 
the column, whichever comes first. Samples of LG7 or 
25 LG 10 are then applied to the blocked (AHTrp) column at 
10 12 pfu/ml in 1.0 ml of the same buffer. The column 
is then washed again with the same buffer until the 
optical density at 280 nm of the effluent returns to 
base line or 4 x Vy have been passed through, whichever 
30 comes first. Following this wash, a gradient of KCl 
from 10 mM to 2 M in 3 x Vy, buffered to pH 8.0 with 
phosphate is passed over the column. The first KCl 
gradient is followed by a KCl gradient running from 2 M 
to 5 M in 3 x Vy. The second KCl gradient is followed 
by a gradient of guanidinium Cl from 0.0 M to 2.0 M in 
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2 x Vy in 5 M KC1 and buffered to pH 8.0 with 
phosphate. Fractions of 50 ul are collected and 
assayed for phage by plating 4 ul of each fraction at 
suitable dilutions on sensitive cells. Retention of 
5 phage on the column is indicated by appearance of LG7 
phage in fractions that elute significantly later from 
the column than control phage LG10 or wtM13 . A 
successful isolate of LG7 that displays BPTI is 
identified , the boti insert and junctions are 
10 sequenced , and this isolate is used for further work 
described below. 

If vgDNA is used to obtain a functional fusion 
between a BPTI mutant and M13 CP (vide infra ) , then DNA 
15 from a clonal isolate is sequenced in the regions that 
were variegated. Then gratuitous restriction sites for 
useful restriction enzymes are removed if possible by 
silent codon changes. The sequence numbers of residues 
in OSP— IPBD will be changed by any • insertions; 
20 hereinafter, we will, however, denote residues inserted 
after residue 23 as 23a, 23b, etc . Insertions after 

residue 81 will be denoted as 81a, 81b, etc. This 
preserves the numbering of residues between C5 and C55 
of BPTI. Residue C5 of BPTI is always denoted as 28 in 
25 the fusion; residue C55 of BPTI is always denoted as 78 

in the fusion, and the intervening residues have 
constant numbers. 

Should LG7 phage from cells grown with 10 mM IPTG 
30 fail to display BPTI on its surface, we have several 
options. We might try to determine why the 

construction failed to work as expected. There are 
various possible modes of failure, including ; a) BPTI 
is not cleaved from the M13 signal sequence, b) BPTI is 
cleaved from the M13 CP, and c) the chimeric protein is 
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made and cleaved after the signal sequence, but the 
processed protein is not incorporated into the M13 
coat. BPTI has been secreted from |Ls_ coli (MARKS 6) ; 
however the M13 coat-protein signal sequence was not 
5 used. Therefore problems stemming from the signal 
sequence are unlikely, but possible. We could 
determine whether BPTI was present in the periplasm or 
bound to the inner membrane of LG7-infected cells by 
assays using try* or Antry*. 

10 

Proteins in the periplasm can be freed through 
spheroplast formation using lysozyme and EDTA in a 
concentrated sucrose solution (BIRD67, MALA64) . If 

BPTI were free in the periplasm, it would be found in 
15 the supernatant. Try* would be mixed with supernatant 
and passed over a non-denaturing molecular sizing 
column and the radioactive fractions collected. The 
radioactive fractions would then be analyzed by SDS- 
PAGE ' and examined for BPTI -sized bands by silver' 
20 staining. 

Spheroplast formation exposes proteins anchored in 
the inner membrane. Spheroplasts are mixed with AHTrp* 
and then either filtered or centrifuged to separate 
25 them from unbound AHTrp*. After washing with 

hypertonic buffer, the spheroplasts are analyzed for 
extent of AHTrp* binding alternatively, membrane 
proteins are analyzed by western blot analysis. 

30 If BPTI is found free in the periplasm, then we 

would expect that the chimeric protein was being 
cleaved both between BPTI and the M13 mature coat 
sequence and between BPTI and the signal sequence. In 

that case, we should alter the BPTI/M13 CP junction by 

35 inserting vgDNA at codons for residues 78-82 of 
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AA_seq2 . 

If BPTI is found attached to the inner membrane, 
then there are two likely explanations . The first is 

5 that the chimeric protein is being cut after the signal 
sequence, but is not being incorporated into LG7 
virion; the treatment would also be to insert vgDNA 
between residues 78 and 82 of AA_seq2. The alternative 
hypothesis is that BPTI could fold and react with 
10 trypsin even if signal sequence is not cleaved. N- 
terminal amino acid sequencing of trypsin-binding 
material isolated from cell homogenate determines what 
processing is occurring. If signal sequence were being 
cleaved, we would use the procedure above to vary 
15 residues between C78 and A82; subsequent passes would 
add residues after residue 81. If signal sequence were 
not being cleaved, we would vary residues between 23 
and 27 of AA_seq2. Subsequent passes through that 
process would add residues after 23. 

20 

If BPTI were found neither in the periplasm nor on 
the inner membrane , then we would expect that the fault 
was in the signal sequence or the signal-sequence-to- 
BPTI junction. The treatment in this case would be to 
25 vary residues between 23 and 27. 

* 

Several experiments that introduce variegation 
into the boti-gene VIII fusion are possible, including: 

30 1) 3 variegated codons between residues 78 and 82 

using olig#12 and olig#13, 

2) 3 variegated codons between residues 23 and 27 
using olig#l4 and olig#15, 
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3) 5 variegated codons between residues 78 and 82 
using olig#13 and olig#12a, 

4) 5 variegated codons between residues 23 and 27 

5 using olig#15 and olig#14a, 

5) 7 variegated codons between residues 78 and 82 
using olig#13 and olig#12b, and 

10 6) 7 variegated codons between residues 23 and 27 

using olig#l5 and olig#14b. 

To alter the BPTI-M13 CP junction, we introduce 
DNA variegated at codons for residues between 78 and 82 
15 into the Soh I and Sfi X sites of pLG7 . The residues 
after the last cysteine are highly variable in amino 
acid sequences homologous to BPTI, both in composition 
and length; in Table 25 these residues are denoted as 
G79, G80, and A81. The first part of the M13 CP is 
20 denoted as A82, E83, and G84. One of the oligo-nts 

olig#12, olig#12a, or olig#12b and the primer olig#13 
are synthesized by standard methods. The oligo-nts 


25 


30 


35 


residue 75 76 77 78 79 80 81 82 83 

5 ' gc | gag | cGC | ATG | CGT | ACC | TGC | qf k | qf k | qf k | GCT | GAA | - 

84 85 86 87 88 89 90 91 

GGT | GAT | GAT | CCG [ GCC | AAA | GCG | GCC | gcg | CC 3' olig#12 


residue 75 76 77 78 79 80 81 81a 81b 

5 • gc I gag j cGC I ATG j CGT | ACC | TGC | qf k | qf k j qf k | qf k | qf k f - 

82 83 84 85 86 87 

GCT | GAA | GGT | GAT | GAT | CCG | - 

88 89 90 91 

GCC | AAA | GCG | GCC | gcg | cc 3' olig#12a 
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residue 75 76 77 78 79 80 81 81a 81b 

5 ' gc | gag | cGC | ATG | CGT | ACC | TGC | qfk | qfk | qf k | qf k | qf k | - 

5 81C 8 Id 82 83 84 85 86 87 

qfk I qfk I GCT j GAA J GGT | GAT | GAT ) CCG | - 


10 


88 89 90 91 

GCC | AAA | GCG | GCC | gcg | cc 3' dig# 12b 


residue 91 90 89 88 87 86 

5' gg | cgc | GGC | CGC | TTT | GGC | CGG | ATC 3' dig# 13 
15 

where q is a mixture of (0.26 T, 0.18C, 0.26 A, and 
0.30 G) , f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 
0.22 G) , and k is a mixture of equal parts of T and G. 
20 The bases shown in lower case at either end are spacers 
and are not incorporated into the cloned gene. The 
primer is complementary to the 3 ' end of each of the 
longer oligo-nts. one of the variegated oligo-nts and 
the primer olig#13 are combined in equimolar amounts 
25 and annealed. The dsDNA is completed with all four 
(nt)TPs and Klenow fragment. The resulting dsDNA and 
RF pLG7 are cut with both sfi I and Soh X, purified, 
mixed, and ligated. This ligation mixture goes through 
the process described in Sec. 15 in which we select a 
30 transformed clone that, when induced with IPTG, binds 
AHTrp. 

To vary the junction between M13 signal sequence 
and BPTI , we introduce DNA variegated at codons for 
35 residues between 23 and 27 into the Knn I and Xho I 
sites of pLG7 . The first three residues are highly 
variable in amino acid sequences homologous to BPTI. 
Homologous, sequences also vary in length at the amino 
terminus. One of the oligo-nts olig#14, olig#14a, or 
40 olig#14b and the primer olig#15 are synthesized by 
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standard methods. The oligo-nts are: 
residue : 17 18 19 20 21 22 23 24 25 

5 5 * g|gcc|gcG|GTA|CCG|ATG|CTG|TCT|TTT|GCT|qfk|qfk|- 

26 27 28 29 30 

| qf k | TTC | TGT | CTC | GAG | cgc | ccg | cga | 3' olig#14 

10 

residue 17 18 19 20 21 22 23 24 25 26 

5 • g I gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qf k | qf k | qf k | - 

15 26a 26b 27 28 29 30 

| qfk | qfk | TTC | TGT | CTC | GAG J cgc | ccg | cga | 3' olig#14a, 


20 


25 



26a 26b 26c 26d 27 28 29 30 

I g^c | qfk | qfk J qfk | TTC | TGT | CTC | GAG | cgc | ccg | cga | 3 1 olig#l4b 


| teg | egg | gcg | CTC | GAG | ACA J GAA | 3» olig#15 


30 


35 


where q is a mixture of (0.26 T, 0.18 C, 0.26 A, and 
0.30 G), f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 
0.22 G) , and k is a mixture of equal parts of T and G. 


The 


shown in lower case at either end are 
One of the variegated oligo-nts and the 


primer are combined in equimolar amounts and annealed. 


The ds DNA is completed with all four (nt)TPs and 

Klenow fragment. The resulting dsONA and RF pLG7 are 

cut with both Kpn I and Xho I, purified, mixed, and 

ligated. This ligation mixture goes through the 

process described in Sec. 15 in which we select a 

transformed clone that, when induced with IPTG, binds 
AHTrp or trp. 
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If none of these approaches produces a working 
chimeric protein, we may try a different signal 
sequence, or a different OSP in M13 (e.g. , the gene III 

protein for which there is fusion data (SMIT85, 
5 CRUZ 88 ) ) , or another genetic package. 


Example 1. Part, tt 


10 


15 


BPTI binds 
(Kd = 6.0 X 10-14 


very tightly to trypsin 
and to anhydrotrypsin, so that 


these molecules are not preferred for optimizing the 
amount of BPTI to display on LG7 or the amount of 
affinity molecule to attach to the column. Tschesche 

• a ~^ * reported on the binding of several BPTI 
derivatives to various proteases: 


20 


25 


Dissociation constants for BPTI derivatives. Molar. 

Residue Trypsin Chymotrypsin Elastase Elastase 

# (bovine (porcine (human 

pancreas) pancreas) pancreas) leukocytes) 

lys ine 

6.0 x 10 " 

14 9.0 X 10" 9 

— 

3.5 x 10” 6 

glycine 

— 

- 

+ 

7.0 x 10“ 9 

alanine 

-b 

— 

2.8 x 10“ 8 

2.5 X 10“ 9 

valine 

— 

— 

5.7 x 10“ 8 

1.1 Xl 0— 10 

leucine 

— 

- 

1.9 x 10“ 8 

2.9 x 10“ 9 


From the report of Tschesche e£ al^ we infer that 
molecular pairs marked •'+» have k*. greater than 
3.5 x 10 - 6 M and that molecular pairs marked have 

30 Kds much greater than 3.5 x 10“6 M . Because of the 
wealth of data about the binding of BPTI and various 
mutants to trypsin and other proteases (TSCH87) , we can 
proceed in various ways. (For other PBDs we can obtain 
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two different monoclonal antibodies, one with a high 
affinity having K d of order 10 -11 M, and one with a 
moderate affinity having on the order of 10” 6 M. ) 
In this example, we may use: a) the moderate binding 

5 between BPTI and human leukocyte elastase (HuLEl) , b) 
the moderately strong binding of porcine elastase to 
BPTI (V15) , or c) the binding of BPTI(A15) (residue 38 
in the Pbd gene) for trypsin (weak but detectable) or 
for porcine pancreatic elastase. 

10 

We compare the retention of LG7 virions to the 
retention of wild- type M13 on (AHTrp) . M13 derivatives 

having more DNA than wild-type M13 have corresponding 
longer virions. Thus we will create pLG8 that differs 
15 from pLG7 only in having stop codons at codons 2 and 
3 , and an altered L codon at codon 7 of the osp— jpbd 
gene. Phage LG8 will have exactly as much DNA as LG7; 
therefore the LG8 virion is exactly as long as the LG7 
virion. LG8 can not, however, display BPTI on its 
20 surface. 

To expedite identification of different M13- 
derived phage, we replace the amp R gene of LG8 with the 
tet R gene from pBR322 by standard methods. The BSM I- 
25 to- Aat ll tet R bearing fragment of pBR322 is ligated 
into DNA from pLG8 cut with Xbal and Aatll. The 
correct construction, having 9.2 kb, is easily 
distinguished from pBR322 and is called LG10. 

30 The phage LG7 is grown at various levels of IPTG 

in the medium and harvested in the way previously 
described. An affinity column having bed volume of 2.0 
ml and supporting an amount of HuLEl picked from the 
range 0.1 mg to 3 0.0 mg on 1 ml of BioRad Affi- 
Gel io (™) or Affi-Gel 15 (™) is designated (HuLEl). 


35 
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An appropriate set of densities of HuLEl on the column 
is (0.1 mg/ml , 0.5 mg/ml , 2.0 mg/ml , 8.0 mg/ml , 15.0 

mg/ml, and 30.0 mg/ml). The Vy of ( HuLEl } is, by 

hypothesis, 1.0 ml. The elution of LG7 phage is 
5 compared to the elution of LG10 on {HuLEl} having 
varying amounts of HuLEl affixed. The columns are 

eluted in a standard way: 

1) 10 mM KC1 buffered to pH 8.0 with phosphate, 

10 until optical density at 280nm falls to base line 

or 4 x Vy, whichever is first, 

2) a gradient of 10 mM to 2 M KCl in 3 x Vy, pH 
held at 8.0 with phosphate , 

15 

3) a gradient of 2 M to 5 M KCl in 3 x V v , 

phosphate buffer to pH 8.0, 

4) constant 5 M KCl plus 0 to 0.8 M guanidinium Cl 

20 in 2 x Vy, with phosphate buffer to pH 8.0. 

The preferred level of induction (IPTG ri p-t- -;- m a i ) and 
amount of affinity molecule on the matrix 

(DoAMoMQp-timal) are those settings that give the 

25 sharpest LG7 elution peak that shows significant 

retardation as compared to LG8, which carries no BPTI. 
By hypothesis, the best separation occurs for the 

amount of BPTI/GP produced when the cells are induced 

with 10.0 uM IPTG and when 4.0 mg HuLEl/ml is applied 

3 0 to BioRad Affi-Gel 10 (™>. 

When the amount of BPTI/GP and the amount of 
HuLE 1 /volume of support have been optimized, we turn to 
optimization of elution rate, initial ionic strength, 
35 and the amount of GP/ (volume of support). These 
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parameters can be optimized separately. 

Using optimal BPTI/GP and HuLEl/volume of support, 
we measure the elution volume of LG7 and LG8 for 
5 different elution rates, viz . 1, 1/2, 1/4, 1/8 and 1/16 
times the maximum flow rate. By hypothesis, 1/4 of 
maximum elution rate is better than 1/2, but 1/8 is 
about the same as 1/4 . Therefore 1/4 maximum elution 
rate will be used. 

10 

Elution volumes of LG7 obtained from cells grown 
on media that is 2.0 mM in IPTG are measured at optimal 
DoAMoM and elution rate for loadings of 10 9 , 10 10 , 

10 11 , and 10 12 pfu. By hypothesis, 10 12 pfu of pure 
15 LG7 overloads the column and significant number of 

phage elute before their characteristic position in the 
KCl gradient. We also find that 10^-^- pfu overloads the 
column only slightly, and that 10^-° pfu does not 

overload the column. Because the use of the affinity 
20 separation in Sec. 15 will involve a population in 

which no single member is more than one part in 10 4 , we 
conclude that 10 12 pfu of a variegated population could 
be applied to a column of 1.0 ml matrix volume without 
overloading with respect any one species. The 
25 overloading of a 1.0 ml column by 10-*- 2 pfu also 

indicates that the initial column that captures 

indiscriminately adhesive phage should be 5 to 10 times 
as large as the column that supports the target 
material . 

30 

Elution volumes of LG7 and LG10 obtained from 
cells grown on media that is 2.0 mM in IPTG are 

measured at optimal conditions and for a loading of 
10 10 pfu for various initial ionic strengths: l.o.mM, 

35 5.0 mM, 10.0 mM, 20.0 mM, and 50.0 mM. We may find, 
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for example, that LG10 is slightly retarded by the 
column when loaded at 1.0 mM KCl , but that LG7 always 
comes off the column at its characteristic place in the 
gradient. We use 10.0 mM as initial ionic strength in 
5 all remaining affinity separations. 

To determine the sensitivity of chromatography of 
phage that display variants of BPTI on their surfaces 
(Sec. 10.1) , we prepare artificial mixtures of two 
10 closely- related phage that differ only at one residue 

in the BPTI domain. One variety of phage has strong 
affinity for the column used in this step, while the 
other phage has no affinity for the column. We 
chromatograph these mixtures to discover how little of 
15 the phage that binds to the column can be detected 

within a large majority of phage that do not bind the 
column. 

For these tests we choose AHTrp as AfM(BPTI) . A 
20 column having 2 ml bed volume is prepared with 

( DoAMoMnpf -i Tnai mg of AHTrp) /(ml of Affi-Gel 10 (™) ) . 
The column is called (AHTrp) and has Vy = 1.0 ml. 

A new phage, LG9, is prepared that displays 
25 BPTI (V15) as IPBD in contrast to LG7 that displays 

BPTI(K15, wild-type) as IPBD. Residue 15 of BPTI is 
residue 38 of the oso-iobd gene. We introduce the 
change K38 to V by replacement of a short segment of 
the osp— inbd gene between Ana I & Stu I. The correct 
30 construction is called pLG9 . To expedite 

differentiation between LG7 and an LG9 -derivative 
phage, we replace the aum^ gene of LG9 with the tet^ 
gene from pBR322. DMA from pBR322 between Bsm I (1353, 
blunted) and Aat li (1428) is ligated to dsDNA from pLG9 
cut with Xbal (blunted) and Aat li . The correct 


35 
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construction, having 9.2 kb, is easily distinguished 
from pBR322 and is called LG11. DNA from phage LG11 is 
sequenced in the vicinity the junctions of the newly 
inserted tet R gene to confirm the construction. 

5 

LG7 and LG11 are grown with optimum IPTG (2.0 mM) 
and harvested. Mixtures are prepared in the ratios 

LG7 : LG11 :: l:Vi -i m 

10 

where ranges from 10 3 * -® to 10^ by factors of 10. 

Large values of are tested first? once a is 

found that allows recovery of LG7, smaller values of 
v lim are not be tested. 

15 

The column {AHTrp} is first blocked by treatment 
with 10 11 virions of Ml3am429 in 100 ul of 10 mM KC1 
buffered to pH 8 . 0 with phosphate ; the column is washed 
with the same buffer until OD 250 returns to base line 
20 or 4 x V v have passed through the column, whichever 
comes first. One of the mixtures of LG7 and LG11 

containing 10 12 pfu in 1 ml of the same buffer is 
applied to {AHTrp} . The column is eluted in a standard 
way : 

25 

1) 10 mM KC1 buffered to pH 8.0 with phosphate, 

until optical density at 280nm falls to base line 
or 4 x Vy, whichever is first, (discard effluent) , 

30 2) a gradient of 10 mM to 2 M KC1 in 3 x V v , pH 

held at 8.0 with phosphate, (30 x 100 ul 

fractions) , 

3) a gradient of 2 M to 5 M KC1 in 3 x Vv/ 

35 phosphate buffer to pH 8.0, (30 x 100 ul 
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fractions) , 

4) constant 5 M KC1 plus 0 to 0.8 M guanidinium Cl 

2 x Vy, with phosphate buffer to pH 8.0, (20 x 

5 100 ul fractions) , 

5 ) constant 5 M KC1 plus 0.8 M guanidinium Cl in 

1.2 x Vy, with phosphate buffer to pH 8.0, (12 x 

100 ul fractions) . 

10 

Samples of 4 ul from each fraction are plated at 
suitable dilution on phage-sensitive Sup + cells (so 
that M13am429 will not grow) . A sample of the column 

15 matrix is also used as inoculum for phage-sensitive 
Sup + cells. Plaques are transferred to ampicillin- 
containing LB agar, and Amp** colonies are tested for 
display of BPTI(K15) by use of trp* or AHTrp*. 

20 By hypothesis, V lim = 4.0 x 10 8 is the largest 

f or which LG7 can be ^recovered • Thus ^sensi ~ 
4.0 x 10 s . Three cycles of chromatography are required 
to isolate LG7 , so the first approximation to C eff is 
740 ( = exp( log e (4 . 0 x 10 s ) /3 ) ). 

25 

We now determine the efficiency of the affinity 
separation (Sec. 10.2) . This is done by: a) preparing 
mixtures of LG7 and LGH in the ratio 1:Q, b) enriching 
the population for LG7 for one separation cycle, and c) 

3 0 determining the fraction of LG7 in the last phage- 
bearing fraction. When Q is 1.5 x 10 4 , 3% of colonies 
are BPTI positive. When Q is 1.5 x 10 3 , 60% of the 

colonies are BPTI positive. Thus we calculate C e ff = 
.60 x 1.5 x 10 3 = 900. 


35 
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Our hypothetical LG7 should display one or more 
BPTX domains on each virion. The osp-ipbd gene is 
under control of the lacUVS promoter so that expression 
levels of BPTI-M13 CP can be manipulated via [IPTG] . 

5 This construct may be used to develop many different 
binding proteins, all based on BPTI. An optimum level 
of induction and amount of AfM(PBD) (= DoAMoM r ,p+- ■; = 

2.0 mg/ (ml of support)) should have been determined; 
target molecules will be applied to columns in this 
10 amount in the process disclosed in Sec. 15.1. These 
optimum levels may be adequate for all targets and all 
variegations of BPTI displayed on derivatives of M13 
based on LG7, but some further optimization may be 
needed if other values of pH or temperatures are used. 
15 

Other • pbd gene fragments may be substituted for 
the bpti gene fragment in pLG7 with a high likelihood 
that PBD will appear on the surface of the new LG7 
derivative . 

20 

Example 1 , Part III 

HHMb is chosen as a typical protein target; an 
other protein could be used. HHMb satisfies all of the 
25 criteria for a target: 1) it is large enough to be 

applied to an affinity matrix, 2) after attachment it 
is not reactive, and 3) after attachment there is 

sufficient unaltered surface to allow specific binding 
by PBDs. 

30 

The essential information for HHMb is known: ’ 1) 
HHMb is stable at least up to 70°C, between pH 4.4 and 

9.3, 2) HHMb is stable up to 1.6 M Guanidinium Cl, 3) 

the pi of HHMb is 7.0, 4) for HHMb, M r = 16,000, 5) 

35 HHMb requires haem, 6) HHMb has no proteolytic 
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activity . 

In addition, the following information about HHMb 
and other myoglobins is available: 1) the sequence of 

5 HHMb, 2) the 3D structure of sperm whale myoglobin 
(HHMb has 19 amino acid differences and it is generally 
assumed that the 3D structures are almost identical) , 
3) its lack of enzymatic activity, 4) its lack of 
toxicity. 

10 

We set the specifications of an SBD as : 

1) T = 25°C 
15 2) pH = 8.0 

3 ) Acceptable solutes : 

A ) for binding : 

i) phosphate, as buffer, 0 to 20 mM, and 
20 ii) KC1, 10 mM, 

B ) for column elution : 

i) phosphate, as buffer, 0 to 30 mM, 

ii) KC1, up to 5 M, and 

iii) Guanidinium Cl, up to 0.8 M. 

25 

4) Acceptable Kg < 1.0 x 10~ 8 M. 

We choose LG7 as GP(IPBD) . 

30 Residues to be varied are picked, in part, through 

the use of interactive computer graphics to visualize 
' the structures. In this section, all residue numbers 
refer to BPTI. We pick a set of residues that forms a 
surface such that all residues can contact one target 
molecule. Information relevant to choosing BPTI 
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residues to vary includes: l) the 3D structure, 2) 

solvent accessibility of each residue (LEEB71) , 3) a 

compilation of sequences of other proteins homologous 
to BPTX, and 4) knowledge of the structural nature of 
5 different amino acid types. 

Tables 16 and 34 indicate which residues of BPTI: 
a) have substantial surface exposure, and b) are known 
to tolerate other amino acids in other closely related 
10 proteins. We use interactive computer graphics to pick 
sets of eight to twenty residues that are exposed and 
variable and such that all members of one set can touch 
a molecule of the target material at one time. If BPTI 
has a small amino acid at a given residue, that amino 
15 acid may not be able to contact the target 
simultaneously with all the other residues in the 
interaction set, but a larger amino acid might well 
make contact. A charged amino acid might affect 

binding without making direct contact. In such cases, 
20 the residue should be included in the interaction set, 
with a notation that larger residues might be useful. 
In a similar way, large amino acids near the geometric 
center of the interaction set may prevent residues on 
either side of the large central residue from making 
25 simultaneous contact. If a small amino acid, however, 
were substituted for the large amino acid, then the 
surface would become flatter and residues on either 
side could make simultaneous contact. Such a residue 
should be included in the interaction set with a 
30 notation that small amino acids may be useful. 

Table 35 was prepared from standard model parts 
and shows the maximum span between C^-t-a and the tip of 
each type of side group. Cjj e -fca is used because it is 
rigidly attached to the protein main-chain; rotation 
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about the c alpha- c beta bond is the most important 
degree of freedom for determining the location of the 
side group. 

5 Table 34 indicates five surfaces that meet the 

given criteria. The first surface comprises the set of 
residues that contacts trypsin in the complex of 
trypsin with BPTI as reported in the Brookhaven Protein 
Data Bank entry " 1TPA" . This set is indicated by the 
10 number "l". The exposed surface of the residues in 
this set (taken from Table 16} totals 1148 A 2 and the 
approximates the area of contact between BPTX and 
trypsin. 

15 Other surfaces, numbered 2 to 5, were picked by 

first picking one exposed, variable residue and then 
picking neighboring residues until a surface was 
defined. The choice of sets of residues shown in Table 
34 is in no way exhaustive or unique; other sets of 
20 variable, surface residues can be picked. Hereinafter 
we refer to K15 as being at the top of the molecule, 
while the carboxy and amino termini are at the bottom. 

Solvent accessibilities are useful, easily 
25 tabulated indicators of a residue's exposure. Solvent 
accessibilities must be used with some caution; small 
amino acids are under-represented and large amine acids 
over-represented. The user must consider what the 
solvent accessibility of a different amino acid would 
30 be when substituted into the structure of BPTX. 

To create specific binding between a derivative of 
BPTI and HHMb, we will vary the residues in set #2. 
This set includes the twelve principal residues 17 (R) , 
35 19(1), 21 (Y) , 27(A), 28(G), 29 (L) , 31(Q), 32 (T) , 34(V), 
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48(A), 49(E), and 52 (M) (Sec. 13.1.1). None of the 

residues in set #2 is completely conserved in the 
sample of sequences reported in Table 34; thus we can 
vary them with a high probability of retaining the 
5 underlying structure. Independent substitution at each 
of these twelve residues of the amino acid types 
observed at that residue would produce approximately 
4.4 x 10 9 amino acid sequences and the same number of 
surfaces . 

10 

BPTI is a very basic protein. This property has 
been used in isolating and purifying BPTI and its 
homologues so that the high frequency of arginine and 
lysine residues may reflect bias in isolation and is 
15 not necessarily required by the structure. Indeed, 

SCI-III from Bombvx mori contains seven more acidic 
than basic groups (SASA84) . 

Residue 17 is highly variable and fully exposed 
20 and can contain R, K, A, Y, H, F, L, M, T, G, Y, P, or 

S. All types of amino acids are seen: large, small, 

charged, neutral, and hydrophobic. That no acidic 
groups are observed may be due to bias in the sample. 

25 Residue 19 is also variable and fully exposed, 

containing P, R, I, S, K, Q, and L. 

Residue 21 is not very variable, containing F or Y 
in 31 of 33 cases and I and W in the remaining cases. 
30 The side group of Y21 fills the space between T32 and 
the main chain of residues 47 and 48. The OH at the 

tip of the Y side group projects into the solvent. 

Clearly one can vary the surface by substituting Y or F 
so that the surface is either hydrophobic or 

35 hydrophilic in that region. It is also possible that 
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the other aromatic amino acid ( viz. H ) or the other 
hydrophobics (L, M, or V) might be tolerated. 

Residue 27 most often contains A, but S, K, L, and 

T are also observed. On structural grounds, this 

residue will probably tolerate any hydrophilic amino 
acid and perhaps any amino acid. 

Residue 28 is G in BPTI. This residue is in a 
turn, but is not in a conformation peculiar to glycine. 
Six other types of amino acids have been observed at 
this residue: K, N, Q, R, H , and N. Small side groups 
at this residue might not contact HHMb simultaneously 

. res ^- d - ues 17 and 34 . Large side groups could 

interact with HHMb at the same time as residues 17 and 

34. Charged side groups at this residue could affect 
binding of HHMb on the surface defined by the other 
residues of the principal set. Any amino acid, except 
perhaps P , should be tolerated. 


Residue 29 
containing L. 
probably tolerate 
perhaps, P. 


is highly variable, most often 
This fully exposed position will 
almost any amino acid except. 


Residues 31, 32, and 34 are highly variable, 

exposed, and in extended conformations; any amino acid 
should be tolerated. 


30 


fully exposed 


48 and 49 are also highly variable and 
, any amino acid should be tolerated. 


Residue 52 is in an alpha helix, any amino acid, 
except perhaps P, might be tolerated. 


35 
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Now we consider possible variation of the 
secondary set: (Sec. 13.1.2) of residues that are in the 
neighborhood of -the principal set. Neighboring 
residues that night be varied at later stages include 
5 9 (P) , 11 (T), 15(H), 16(A), 18(1), 20 (R), 22(F), 24 (N), 

26 (K), 35 (Y) , 47 (S) , 50(D), and 53 (R) . 

Residue 9 is highly variable, extended, and 

. exposed. Residue 9 and residues 48 and 49 are 
10 separated by a bulge caused by the ascending chain from 
residue 31 to 34. For residue 9 and residues 48 and 49 
to contribute simultaneously to binding, either the 
target must have a groove into which the chain from 31 

to 34 can fit, or all three residues (9, 48, and 49) 

15 must have large amino acids that effectively reduce the 
radius of curvature of the BPTI derivative. 

Residue 11 is highly variable, extended, and 

exposed. Residue 11, like residue 9, is slightly far 
20 from the surface defined by the principal residues and 
will contribute to binding in the same circumstances. 

Residue 15 is highly varied. The side group of 
residue 15 points away form the face defined by set #2. 
25 Changes of charge at residue 15 could affect binding on 
the surface defined by residue set #2. 

Residue 16 is varied but points away from the 
surface defined by the principal set. changes in 
30 charge at this residue could affect binding on the face 
defined by set #2. 

Residue 18 is X in BPTI. This residue is in an 
extended conformation and is exposed. Five other amino 
35 acids have been observed at this residue: M, F, L, V, 
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and T. Only T is hydrophilic. The 
directly away from the surface defined 
#2. Substitution of charged amino 
residue could affect binding at 
residue set # 2 . 


group points 
by residue set 
acids at this 
defined by 


10 


Residue 20 is R in BPTI. This residue is in an 
extended conformation and is exposed. Four other amino 
acids have been observed at this residue: A, S, L, and 
Q. The side group points directly away from the 
surface defined by residue set #2. Alteration of the 
charge at this residue could affect binding at surface 
defined by residue set #2. 


15 


20 


Residue 22 is only slightly varied, being Y, F, or 
H in 30 of 33 cases. Nevertheless, A, N, and S have 
been observed at this residue. Amino acids such as L, 
T, or Q could be tried here. Alterations at residue 
22 may affect the mobility of residue 21; changes in 
charge at residue 22 could affect binding at the 
surface defined by residue set #2. 


25 


Residue 24 shows some variation, 
not interact with one molecule 
simultaneously with all the residues 
set. Variation in charge at this 
©ffect on binding at the surface 
principal set. 


but probably can 
of the target 
in the principal 
might have an 
defined by the 


30 


in 


Residue 26 is highly varied and 
may affect binding at the 
set #2; substitutions may affect 
27 that is in the principal 


. Changes 
defined by 
the mobility 


35 


Residue 35 is most often Y, W has been observed. 
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The side group of 35 is buried, but substitution of F 
or W could affect the mobility of residue 34. 

Residue 47 is always T or S in the sequence sample 
5 used. The 0g ajnma probably accepts a hydrogen bond from 
the NH of residue 50 in the alpha helix. Nevertheless, 
there is no overwhelming steric reason to preclude 
other amino acid types at this residue. In particular, 
other amino acids the side groups of which can accept 
10 hydrogen bonds, viz . N, D, Q, and E, may be acceptable 
here. 

« 

Residue 50 is often an acidic amino acid, but 

other amino acids are possible. 

15 

Residue 53 is often R, but other amino acids have 
been observed at this residue. Changes of charge may 
affect binding to the amino acids in interaction set 
# 2 . 

20 

From published models (HUBE77, WLOD84) one can see 
that R39 is on the opposite side of BPTI from the 

surface defined by the residues in set #2. Therefore, 
variation at residue 39 at the same time as variation 
25 of some residues in set #2 is much less likely to 

improve binding that occurs along surface #2 than is 
variation of the other residues in set #2 . 

In addition to the twelve principal residues and 
30 13 secondary residues, there are two other residues, 

30(C) and 33(F), involved in surface #2 that we will 
probably not vary, at least not until late in the 

procedure. These residues have their side groups 

buried inside BPTI and are conserved. Changing these 
35 residues does not change the surface nearly so much as 
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does changing residues in the principal set. These 
buried, conserved residues do, however, contribute to 
the surface area of surface #2 . The surface of residue 
set #2 is comparable to the area of the trypsin-binding 
5 surface. Principal residues 17, 19, 21, 27, 28, 29, 

31, 32, 34, 48, 49, and 52 have a combined solvent- 
accessible area of 946.9 & 2 . secondary residues 9, 11, 
15, 16, 18, 20, 22, 24, 26, 35, 47, 50, and 53 have 
combined surface of 1041.7 8 . 2 . Residues 30 and 33 have 
10 exposed surface totaling 38.2 $ 2 . Thus the three 

groups* combined surface is 2026.8 & 2 . 

Residue 30 is C in BPTI and is conserved in all 
homologous sequences. It should be noted, however, 
15 that C14/C38 is conserved in all natural sequences, yet 
Marks et al ♦ (MARK87) showed that changing both C14 and 
C38 to A, A or T,T yields a functional trypsin 
inhibitor. Thus it is possible that BPTI-like 
molecules will fold if C30 is replaced. 

20 

Residue 33 is F in BPTI and in all homologous 
sequences. Visual inspection of the BPTI structure 
suggests that substitution of Y, M, H, or L might be 
tolerated. 

25 

* 

Given our hypothetical affinity separation 
sensitivity, C sens ^, we decide to vary six residues 
leaving some margin for errors in the actual base 
composition of variegated bases. To obtain maximal 
30 recognition, we choose residues from the principal set 
that are as far apart as possible. Table 36 shows the 
distances between the beta carbons of residues in the 
principal and peripheral set. R17 and V34 are at one 
end of the principal surface. Residues A27, G28, L29, 
35 A48 , E49, and M52 are at the other end, about twenty 
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Angstroms away; of these, we will vary residues 17, 27, 
29, 34, and 48. Residues 28, 49, and 52 will be varied 
at later rounds. 

5 Of the remaining principal residues, 21 is left to 

later variations. Among residues 19, 31, and 32, we 

arbitrarily pick 19 to vary. 

Unlimited variation of six residues produces 6.4 x 
10 10 7 amino acid sequences. By hypothesis, Csensi is 1 

in 4 x 10®. Table 37 shows the programmed variegation 
at the chosen residues. The parental sequence is 
present as 1 part in 5.5 x 10 7 , but the least favored 
sequences are present at only 1 part in 4.2 x 10®. 
15 Among single-amino-acid substitutions from the PPBD, 
the least favored is F17-I19-A27-L29-V34-A48 and has a 
calculated abundance of 1 part in 1.6 x 10®. Using the 
optimal qfk codon, we can recover the parental sequence 
and all one-amino-acid substitutions to the PPBD if 
20 actual nt compositions come within 5% of programmed 

compositions. The number of transformants is M n -(- v = 
1.0 x 10® (also by hypothesis), thus we will produce 
most of the programmed sequences. 

25 The residue numbers above refer to mature BPTX. 

Since Table 25 refers to the pre-M13CP-BPTI protein, 
all mature BPTI sequence numbers have been increased by 
the length of the signal sequence, 23. Thus, we wish 
to vary residues 40, 42, 50, 52, 57, and 71. A DNA 

30 subsequence containing all these codons is found 

between the (Apal) sites at base 191 and the Sphl site 
at base 309 of the oso-obd gene. Among Ana l . Dr a IT . 
and Pss I . Apa l is preferred because it recognizes six 
bases without any ambiguity and will cut fewer 
35 sequences in the vgDNA. Gratuitous restriction sites 
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20 


25 


can be avoided in some cases by use of codon ambiguity: 

changing the codon for g5l from GGC to GGT makes it 

impossible to generate an ApaX site at codons 50, 51, 
and 6=52 . 


Each piece of dsDNA to be synthesized needs six to 
eight bases added at either end to allow cutting with 
restriction enzymes and is shown in Table 37. The 
first synthetic base (before cutting with Aoa i and 
Sphi) is 184 and the last is 322. There are 142 bases 
to be synthesized. The center of the piece to the 
synthesized lies between Q54 and V57. The overlap can 
not include varied bases, so we choose bases 245 to 256 
as the overlap that is 12 bases long. Note that the 
codon for F56 has been changed to TTC to increase the 
GC content of the overlap. The amino acids that are 
being varied are marked as X with a plus over them. 
Codons 57 and 71 are synthesized on the sense (bottom) 
strand. The design calls for "qfk" in the antisense 
strand, so that the sense strand contains (from 5’ to 
3 ' ) a) equal part C and A f i. e. the complement of k) , 
b) (0.40 T, 0.22 A, 0.22 C, and 0.16 G) f i.e. the 

complement of f ) , and c) (0.26 T, 0.26 A, 0.30 C, and 
0.18 G) . 


Each residue that is encoded by "qfk" has 21 
possible outcomes, each of the amino acids plus stop. 
Table 12 gives the distribution of amino acids encoded 
by "qfk", assuming 5% errors. The abundance of the 
30 parental sequence is the product of the abundances of R 

XX x A x L x V xA. The abundance of the least- 
favored sequence is 1 in 4.2 x 10 9 . 


01ig#27 and olig#28 are annealed and extended with 
Klenow fragment and all four (nt)TFs. Both the ds 


35 
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synthetic DNA and RF pLG7 DNA are cut with both Apa I 
and Sph I . The cut DNA is purified and the appropriate 
pieces ligated (See Sec. 14.1) and used to transform 
competent PE383. (Sec. 14.2). In order to generate a 
5 sufficient number of transformants, we start with 5.0 1 
of cells 

1) culture JL. coli in 5.0 1 of LB broth at 37°C 

until cell density reaches 5 x 10 7 to 7 x 10 7 

10 cells/ml, 

2) chill on ice for 65 minutes, centrifuge the 
cell suspension at 4000g for 5 minutes at 4°C, 

15 3) discard supernatant; resuspend the cells in 

1667 ml of an ice-cold, sterile solution of 60 

* 

mM CaCl2 , 

4) chill on ice for 15 minutes, and then 

20 centrifuge at 4000g for 5 minutes at 4°C, 

5) resuspend cells in 2 x 400 ml of ice-cold, 
sterile 60 mM CaCl 2 ; store cells at 4°C for 24 
hours , 

25 

6) add DNA (100 pg) in 20 ml of litigation or 
TE buffer; mix, inculafe on ice for minutes, 

7) distribute into 200 pi aliquots and heat 

30 shock cells at 42°C for 20 seconds, 

®) add 200 ml LB broth and incubate at 37°C for 
1 hour,' 

35 9) add the culture to 2.0 1 of LB broth 
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containing ampicillin at 35-100 ug/ml and. 
culture overnight at 37°C / 

10) after 6 hours, remove 200 ml and plate 0.5 

5 ml portions with log phase JM 107 on LB agar, 

using the soft- agar overlay technique. Phage 
are prepared from the soft agar, 

11) centrifuge the overnight culture to remove 

10 cells, and pellet phage (MESS83) , 

12) harvest virions by method of Salivar, et 
al. (SALX64) . 

15 

It is important to: a) use all or nearly all the 
vgDNA synthesized in ligation, b) use all or nearly all 
the ligation mixture to transform cells, and c) culture 
all or nearly all the' transformants. These measures 
20 are directed at maintaining diversity. 

It is important to collect virions in a way that 
samples all or nearly all the transformants. Because 
F” cells are used in the transformation, multiple 
25 infections do not pose a problem in the overnight phage 
production. F' cells are used for phage production in 
agar. 

HHMb has a pi of 7.0 and we carry out 
30 chromatography at pH 8.0 so that HHMb is slightly 
negative while BPTI and most of its mutants are 
positive. HHMb is fixed (Sec. 15.1) to a 2.0 ml column 
on Affi-Gel 10 or Affi-Gel 15 (TM) at 4.0 mg/ml 

support matrix, the same density that is optimal for a 
35 column supporting trp. 
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To remove variants of BPTI with strong, 
indiscriminate binding for any protein or for the 
support matrix (Sec. 15.2) , we pass the variegated 
5 population of virions over a column that supports 
bovine serum albumin (BSA) before loading the 
population onto the {HHMb} column. Affi-Gel io(^M) or 
Af f i-Gel 15 ('EM) use( j to immobilize BSA at the 
highest level the matrix will support. A 10.0 ml 
10 column is loaded with 5.0 ml of Affi-Gel-linked-BSA; 
this column, called (BSA), has Vy = 5.0 ml. The 
variegated population of virions containing 10 12 pfu in 
1 ml (0.2 x Vy) of 10 mM KCl, 1 mM phosphate, pH 8.0 
buffer is applied to (BSA). We wash (BSA) with 4.5 ml 
15 (0.9 x V v ) of 50 mM KCl, 1 mM phosphate, pH 8.0 buffer. 

The wash with 50 mM salt will elute virions that adhere 
slightly to BSA but not virions with strong binding. 

The pooled effluent of the (BSA) column is 5.5 ml of 
approximately 13 mM KCl . 

20 

The column (HHMb) is first blocked by treatment 
with 10 11 virions of M13(am429) in 100 ul of 10 mM KCl 
buffered to pH 8.0 with phosphate; the column is washed 
with the same buffer until OD 26 O returns to base line 
25 or 2 x Vy have passed through the column, whichever 

comes first. The pooled effluent from {BSA} is added 
to {HHMb} in 5.5 ml of 13 mM KCl, 1 mM phosphate, pH 
8.0 buffer. The column is eluted (Sec. 15.3) in the 
following way; 

30 

1) 10 mM KCl buffered to pH 8.0 with phosphate, 
until optical density at 280nm falls to base line 
or 2 x Vy, whichever is first, (effluent 
discarded) , 


35 
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2) a gradient: of 10 mM to 2 M KC1 in 3 x Vy, pH 
held at 8.0 with phosphate, (30 x 100 /nl 
fractions) , 

5 3) a gradient of 2 M to 5 M KC1 in 3 x Vy, 

phosphate buffer to pH 8.0 (30 x 100 fil 
fractions) , 

4) constant 5 M KC1 plus 0 to 0.8 M guanidinium Cl 

10 in 2 x Vy* with phosphate buffer to pH 8.0, (20 x 

100 jil fractions) , and 

5 ) constant 5 M KC1 plus 0.8 M guanidinium Cl in 1 

x Vy, with phosphate buffer to pH 8.0, (10 x 100 

15 /nl fractions) . 

Xn addition to the elution fractions, a sample is 
removed from the column and used as an inoculum for 
phage-sensitive sup + cells (Sec. 15.4) . A sample of 4 
20 /il from each fraction is plated on phage-sensitive Sup + 
cells. Fractions that yield too many colonies to count 
are replated at lower dilution. An approximate titre 
of each fraction is calculated. Starting with the last 
fraction and working toward the first fraction that was 
25 titered, we pool fractions until approximately io 9 

phage are in the pool, i.e. about 1 part in 1000 of the 
phage applied to the column. This population is 
infected into 3 x 10^ phage-sensitive PE384 in 300 ml 
of LB broth. The low multiplicity of infection is 
30 chosen to reduce the possibility of multiple infection. 
After thirty minutes, viable phage have entered 
recipient cells but have not yet begun to produce new 
phage. Phage-bom genes are expressed at this phase, 
and we can add ampicillin that will kill uninfected 
35 cells. These cells still carry F— pili and will absorb 
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phage helping to prevent multiple infections. 

Xf multiple infection should pose a problem that 
cannot be solved by growth at low multiple-of- 
5 infection on F + cells, the following procedure can be 
employed to obviate the problem. Virions obtained from 
the affinity separation are infected into F + Ej_ coli 
and cultured to amplify the genetic messages (Sec. 
15.5). CCC DNA is obtained either by harvesting RF DNA 
10 or by in vitro extension of primers annealed to ss 

phage DNA. The CCC DNA is used to transform F“ cells 
at a high ratio of cells to DNA. Individual virions 
obtained in this way should bear proteins encoded only 
by the DNA within. 

15 

The variegation produces as many as 6.4 x 10 7 
different amino-acid sequences. C e f£ is 900. Thus, 
after two separation cycles, the probability of 
isolating a single SBD is less than 0.10; after three 
20 cycles, the probability rises above 0.10. 

The phagemid population is grown and 
chromatographed three times and then examined for SBDs 
(Sec. 15.7) . In each separation cycle, phage from the 
25 last three fractions that contain viable phage are 
pooled with phage obtained by removing some of the 
support matrix as an inoculum. At each cycle, about 
10 12 phage are loaded onto the column and about 10® 
phage are cultured for the next separation cycle. 

30 After the third separation cycle, 32 colonies are 

picked from the last fraction that contained viable 
phage; phage from these colonies are denoted SBDl, 

SBD2, . . . , and SBD32 . 

35 Each of the SBDs is cultured and tested for 
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retention on a Pep-Tie column supporting HHMb (Sec. 
15.8). Phage LG7 (SBD11) shows the greatest retention 
on the Pep-Tie (HHMb} column, eluting at 367 mM KC1 
while wtM13 elutes at 20 mM KC1. SBD11 becomes the 
5 parental amino-acid sequence to the second variegation 
cycle . 

The result of this hypothetical experiment is 
shown in Table 38. R40 changed to D, 142 changed to 

0 Q, A50 changed to E, L52 remained L, and A71 changed to 
W. 


The next round of variegation (Sec. 16) is 
illustrated in Table 39. The residues to be varied are 
15 chosen by: a) choosing some of the residues in the 

principal set that were not varied in the first round 
(yjz- residues 42, 44, 51, 54, 55, 72, or 75 of the 
fusion) , and b) choosing some residues in the secondary 
set. Residues 51, 54, 55, and 72 are varied through 
20 all twenty amino acids and, unavoidably, stop. Residue 
44 is only varied between Y and F. Some residues in 
the secondary set are varied through a restricted 
range? primarily to allow different charges (+, o, -) 
to appear. Residue 38 is varied through K, R, E, or G. 
25 Residue 41 is varied through I, V, K, or E. Residue 43 
is varied through R, S, G, N, K, D, E, T, or A. 


01ig#29 and olig#30 are synthesized, annealed, 
extended and cloned into pLG7 at the Aoa x/ soh i sites. 
The ligation mixture is used to trams form 5 1 of 
competent PE383 cells so that lo 9 transformants are 
obtained. A new {HHMb} is constructed using the same 
support matrix as was used in round 1. A sample of 
10 12 of the harvested LG7 are applied to {HHMb} and 
affinity separated. The last 10 9 phage off the column 
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and an inoculum are pooled and cultured. The cultured 
phagemids are re-chromatographed for three separation 
cycles. Thirty-two clonal isolates (denoted SBD11— 1, 
SBD11— 2,..., SBD11-32) are obtained from the effluent 
5 of the third separation cycle and tested for binding on 
a Pep-Tie {HHMb} column. Of this set, SBD11-23 shows 
the greatest retention on the Pep-Tie {HHMb} column, 
eluting at 692 mM KC1. 

10 The results of this hypothetical selection is 

shown in Table 40. Residue 38 (K15 of BPTI) changed to 

E, 41 becomes V, 43 goes to N, 44 goes to F, 51 goes to 

F, 54 goes to S, 55 goes to A, and 72 goes to Q. 

15 The sbdll-23 portion of the osp-pbd gene is cloned 

into an expression vector and BPTI(E15, D17, V18, Q19, 
N20, F21, E27, F28, L29, S31, A32, S34, W71, Q72) is 
expressed in the periplasm. This protein is isolated 
by standard methods and its binding to HHMb is tested. 
20 is found to be 4.5 x 10“^ M. 

A third round of variation, using SBD11-23 as 
PPBD, is illustrated in Table 41; eight amino acids are 
varied. Those in the principal set, residues 40, 55, 

25 and 57, are varied through all twenty amino acids. 
Residue 32 is varied through P, Q, T, K, A, or E. 

Residue 34 is varied through T, P, Q, K, A, or E. 

Residue 44 is varied through F, L, Y, C, W, or stop. 

Residue 50 is varied through E, K, or Q. Residue 52 is 

30 varied through L, F, X, M, or V. 

The result of this variation is shown in Table 42 . 
The selected SBD is denoted SBD11-23-5 and elutes from 
a Pep-Tie {HHMb} column at 980 mM KC1. The sbdll-23-5 
35 segment is cloned into an expression vector and 
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BPTI(E9, Qll, E15, A17 , V18 , Q19 , N20, W21, Q27, F28, 
M29 f S31, L32 , H34, W71, Q72) is producsd« This time 
the is 7.3 x 10~ 9 M. 

5 This example is hypothetical. It is anticipated 

that more variegation cycles will be needed to achieve 
dissociation constants of 10“ 8 M. It is also possible 
that more than three separation cycles will be needed 
in some variegation cycles. Real DNA chemistry and DNA 
10 synthesizers may have larger errors than our 

hypothetical 5%. If S err . > 0.05, then we may not be 
able to vary six residues at once. Variation of 5 
residues at once is certainly possible. 
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Outer-Surface Proteins 


Reason for preference 

a) exposed amino terminus, 

b) predictable post- 
translational 
processing, 

c) numerous copies in 

virion . 

fusion data available 


a) known to be on virion 
exterior, 

b) small enough that 
the G-ipbd gene can 

replace H gene. 


a) fusion data available, 
non-essential 


a) no post-translational 
processing, 

b) distinctive sdequence 
that causes protein to 
localize in spore coat, 

c) non-essential . 


Same as for cotc 
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Table 7: Atomic radii 
Angstroms 


c alpha 

1.70 

°carbonyl 

1.52 

N amide 

1.55 

Other atoms 

o 

CO 

• 

H 


Table 8 


Fraction of DNA molecules having 
5 n non-parental bases when 

reagents that have fraction 
M of parental nt. 


10 

M 

.9965 

.97716 

.92612 

.8577 

.79433 

.63096 


fO 

.9000 

.5000 

.1000 

.0100 

. 0010 

.000001 


fl 

.09499 

.35061 

.2393 

.04977 

. 00777 

.0000175 


f 2 

.00485 

.1188 

.2768 

.1197 

.0292 

.000149 


f 3 

.00016 

.0259 

.2061 

.1854 

.0705 

.000812 

15 

f 4 « 

. 000004 

.00409 

.1110 

.2077 

.1232 

.003207 


f8 

0 . 

2xl0“ 7 

.00096 

.0336 

.1182 

.080165 


fl6 

0 . 

0 . 

0 . 

5xl0” 7 

.00006 

.027281 

20 









f23 

0 . 

0 . 

0 . 

0 . 

0 . 

.0000089 


most 0 

0 

2 

5 

7 

12 


"most" is the value of n having the highest 
probability . 


25 
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Table 9: best vgCodon 

Program "Find Optimum vgCodon. " 

5 INITIALIZE-MEMORY— OF-ABUNDANCES 

DO ( tl = 0.21 to 0.31 in steps of 0.01 ) 

. DO ( cl = 0.13 to 0.23 in steps Of 0.01 ) 

. . DO ( al = 0.23 to 0.33 in steps of 0.01 ) 
Comment calculate gl from other concentrations 

10 . . . gl = 1.0 - tl - cl - al 

. . . IF( gl .ge. 0.15 ) 

. . . . DO ( a2 = 0.37 to 0.50 in steps of 0.01 ) 

DO ( c2 = 0.12 to 0.20 in steps of 0.01 ) 


Comment Force D+E = R + K 

15 g2 — (gl*a2 -. 5*al*a2)/ (cl+0 . 5*al) 

Comment Calc t2 from other concentrations. 


t2 = 1. - a2 - c2 - g2 

IF(g2.gt. 0.1. and. t2.gt.0.1) 

....... CALCULATE -ABUNDANCES 

20 COMPARE -ABUNDANCES-TO-PREVXOUS -ONES 

end_IF_block 

end_DO_loop ! c2 

end_DO_loop ! a2 

end_IF_block ! if gl big enough 

• 25 . . . . end_DO_loop ! al 

. . .end_DO_loop i cl 

. . end_DO_loop 1 tl 

WRITE the best distribution and the abundances. 
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Table 10 : Abundances obtained 
from optimum vgCodon 


Amino 


10 


15 


acid 

Abundance 

A 

4.80% 

D 

6.00% 

F 

2 • 86% 

H 

3.60% 

K 

5.20% 

M 

2.86% 

P 

2.88% 

R 

6.82% 

T 

4.16% 

W 

2.86% lfaa 

stop 

5.20%. 


Amino 
acid 
C 
£ 

6 
X 
L 
N 

Q 

S 


V 

Y 


Abundance 
2 . 86 % 
6 . 00 % 
6.60% 
2 . 86 % 
6.82% 
5.20% 

3 . 60% 

7 . 02% in-Faa 


6.60% 

5.20% 


20 


ratio — Abun (W) /Abun (S) = 0.4074 


25 


30 


3 . 

1 

2 

3 

4 

5 

6 
7 


-f l/ratio \ 3 
2.454 
6.025 
14.788 
36.298 
89.095 

218.7 

536.8 


(ratio) 3 
.4074 
.1660 
.0676 
.0275 
.0112 

4.57 X 10- 
1.86 X 10' 


mfaa — 


least - favored amino-acid 
most — favored amino-acid 


stop- free 
.9480 
.8987 
.8520 
.8077 
.7657 
.7258 
.6881 


35 
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Table 11: Calculate worst codon. 


20 


25 


30 


Program "Find worst vgCodon within Serr of given 

distribution." 

INITIALI ZE-MEMORY -OF-ABUNDANCES 



Comment 

Serr is % error 

’ level. 


READ 

Serr 



Comment 

Tli , Cli , Ali , Gli , 

T2i,C2i,A2i,G2i 

10 

Comment 

are the intended 

. nt-distribution 


READ 

Tli, Cli, Ali, 

Gli 


READ 

T2i, C2i, A2i, 

G2i 


READ 

T3i, G3i 



Fdwn 

= 1 . -Serr 


15 

Fup 

= l.+Serr 



Comment 


35 


DO ( tl = Tli*Fdwn to Tli*Fup in 7 steps) 

. DO ( cl = Cli*Fdwn to Cli*Fup in 7 steps) 

. . DO ( al = Ali*Fdwn to Ali*Fup in 7 steps) 
gl = 1. - tl - cl - al 
IF ( (gl-Gli)/Gli .It. -Serr) 
gl too far below Gli, push it back 
. gl = Gli*Fdwn 

. factor = (i.-gl)/(tl + cl + al) 

. tl = tl*factor 
. cl = cl* factor 
. al » al* factor 
. . end_IF_block 
IF ( (gl-Gli)/Gli .gt. Serr) 
gl too far above Gli, push it back 
. gl = Gli*Fup 

. factor = (l.-gl)/(tl + cl + al) 

. tl = tl* factor 
. cl = cl* factor 
. al = al*factor 
. . end IF block 


Comment 
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Table 11, continued. 

. . . DO ( a2 — A2i*Fdwn to A2i*Fup in 7 steps) 

5 Table ll, continued. 

. . . . DO ( c2 =* C2i*Fdwn to C2i*Fup in 7 steps) 



* * • 
Comment 

• 

. DO (g2=G2i*Fdwn to G2i*Fup in 7 steps) 
Calc t2 from other concentrations. 

10 

• • • 


. t2 = 1. - a2 — c2 - g2 


• • • 


. XF ( (t2-T2i)/T2i .It. -Serr) 


Comment 

t2 

too far below T2i, push it back 


• 


. . t2 = T2i*Fdwn 


• « • 


. . factor = (l.-t2)/(a2 + c2 + g2) 

15 

m • « 


. . a2 = a2*f actor 


m m m 


. . c2 = c2*f actor 


m • m 


. . g2 = g2*factor 


m m m 


. . . end_IF_block 


m m * 


. IF( (t2-T2i)/T2i ,gt. Serr) 

20 

Comment 

t2 

too far above T2i, push it back 


• • • 


. . t2 = T2i*Fup 


• • * 


. . factor = (l.-t2)/(a2 + c2 + g2) 
Table 11, continued. 

25 



. . a2 = a2*f actor 




. . c2 = c2*f actor 




. . g2 =» g2*factor 




. . . end_IF_block 




. IF(g2.gt. 0.0 .and. t2.gt.0.0) 

30 



. . t3 = 0.5*(1.-Serr) 




. • g3 = 1. - t3 




. . CALCULATE -ABUNDANCES 




. . COMPARE -ABUNDANCES-TO-PREVIOUS -ONES 




. . t3 = 0.5 

35 



. . g3 = 1. - t3 
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Table 11, continued. 

CALCULATE -ABUNDANCES 

COMPARE -ABUNDANCES -TO-PRE VI OUS-ONES 

5 t3 = 0.5* (l.+Serr) 

g3 - 1. - t3 

CALCULATE— ABUNDANCES 

Table 11, continued. 

10 COMPARE— ABUNDANCES— TO— PREVIOUS— ONES 

end_IF_block 

. . . . . . . end_DO_loop I g2 

end_D0_loop ! c2 

end_DO_loop ! a2 

15 .... end_DO_loop ! al 

. . . end_DO_loop ! cl 
. .end_DO_loop 1 tl 

WRITE the WORST distribution and the abundances. 



0 \ 01 ** W to Hp- 
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Table 12 : Abundances obtained 
using optimum vgCodon assuming 

5% errors 


Amino 
acid 
A 
D 
F 


T 

W 

stop 


Abundance 
4 . 59 % 
5 . 45 % 


H 

3.59% 

K 

5.73% 

M 

3.00% 

P 

3.02% 

R 

7.68% mfaa 


4 . 37 % 

3 . 05 % 

5 . 27 % 


Amino 
acjd 
C 
E 
G 
I 
L 
N 

Q 

s 

V 

Y 


Abundance 
2 . 76 % 
6 . 02 % 
6 . 63 % 
2 . 71 % 
6 . 71 % 

5 . 19 % 
3 . 97 % 
7 . 01 % 
6 . 00 % 
4 . 77 % 


ratio = Abun(F)/Abun(R) = 0.3248 


f 1/ratio^ J 
3.079 
9.481 
29.193 
89.888 
276.78 
852.22 
2624.1 


( ration j 
.3248 
.1055 
.03425 
.01112 
3.61 X 10“ 3 
1.17 X 10“ 3 
3.81 x 10“ 4 


Stop-free 
.9473 
.8973 
.8500 
.8052 
.7627 
.7225 
. 6844 
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Table 13: BPTI Homologues 




R # 

1 

2 

3 

4 

5 

—3 

— 

- 

- 

F 


-2 

— 

- 

- 

Q 

T 

-1 

— 

— 

— 

T 

E 

1 

R 

R 

R 

P 

R 

2 

P 

P 

P 

P 

P 

3 

D 

D 

D 

D 

D 

4 

F 

F 

F 

L 

F 

5 

C 

C 

C 

C 

C 

6 

L 

L 

L 

Q 

L 

7 

E 

E 

E 

L 

E 

8 

P 

P 

P 

P 

P 

9 

P 

P 

P 

Q 

P 

10 

Y 

Y 

Y 

A 

Y 

11 

T 

T 

T 

R 

T 

12 

G 

G 

G 

G 

G 

13 

P 

P 

P 

P 

P 

14 

C 

T 

A 

C 

C 

15 

K 

K 

K 

K 

K 

16 

A 

A 

A 

A 

A 

17 

R 

R 

R 

A 

A 

18 

I 

I 

I 

L 

M 

19 

I 

I 

I 

L 

I 

20 

R 

R 

R 

R 

R 

21 

Y 

Y 

Y 

Y 

Y 

22 

F 

F 

F 

F 

F 

23 

Y 

Y 

Y 

Y 

Y 

24 

N 

N 

N 

N 

N 

25 

A 

A 

A 

S 

A 

26 

K 

K 

K 

T 

K 

27 

A 

A 

A 

S 

A 

28 

G 

G 

G 

N 

G 

29 

L 

L 

L 

A 

F 

30 

C 

C 

C 

C 

C 

31 

Q 

Q 

Q 

E 

E 

32 

T 

T 

T 

P 

T 

33 

F 

F 

F 

F 

F 

34 

V 

V 

V 

T 

V 

35 

Y 

Y 

Y 

Y 

Y 

36 

G 

G 

G 

G 

G 

37 

G 

G 

G 

G 

G 

38 

C 

T 

A 

C 

C 

39 

R 

R 

R 

Q 

R 

40 

A 

A 

A 

G 

A 

41 

K 

K 

K 

N 

K 

42 

R 

R 

R 

N 

S 

43 

N 

N 

N 

N 

N 


6 7 8 9 10 11 12 


— 

- 

- 

- 

- 

- 

Q 

R 

R 

R 

R 

R 

R 

F 

L 

P 

P 

P 

P 

p 

p 

R 

D 

D 

D 

D 

D 

D 

K 

F 

F 

F 

F 

F 

F 

L 

c 

c 

c 

c 

C 

C 

C 

L 

L 

L 

L 

L 

X. 

i 

E 

E 

E 

E 

E 

E 

L 

p 

p 

P 

p 

p 

p 

H 

p 

P 

p 

p 

p 

p 

R 

Y 

Y 

Y 

Y 

Y 

Y 

N 

T 

T 

T 

T 

T 

T 

P 

G 

G 

G 

G 

G 

G 

G 

p 

P 

p 

p 

P 

p 

R 

C 

C 

C 

c 

c 

C 

c 

V 

G 

A 

L 

I 

K 

Y 

A 

A 

A 

A 

A 

A 

Q 

R 

R 

R 

R 

R 

R 

K 

X 

i' 

i 

i 

i 

i 

i 

I 

i 

I 

i 

I 

i 

P 

R 

R 

R 

R 

R 

R 

A 

Y 

Y 

Y 

Y 

Y 

Y 

F 

F 

F 

F 

F 

F 

F 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

N 

N 

N 

N 

N 

N 

N 

A 

A 

A 

A 

A 

A 

Q 

K 

K 

K 

K 

K 

K 

K 

A 

A 

A 

A 

A 

A 

K 

G 

G 

G 

G 

G 

G 

K 

L 

L 

L 

L 

L 

L 

Q 

C 

c 

C 

c 

c 

c 

c 

Q 

Q 

Q 

Q 

Q 

Q 

E 

T 

T 

T 

T 

T 

T 

G 

F 

F 

F 

F 

F 

F 

F 

V 

V 

V 

V 

V 

V 

T 

Y 

Y 

Y 

Y 

Y 

Y 

W 

G 

G 

G 

G 

G 

G 

S 

G 

G 

G 

G 

G 

G 

G 

c 

C 

c 

C 

c 

c 

C 

R 

R 

R 

R 

R 

R 

G 

A 

A 

A 

A 

A 

A 

G 

K 

K 

K 

K 

K 

K 

N 

R 

R 

R 

R 

R 

R 

s 

N 

N 

N 

N 

N 

N 

N 


13 

14 

15 

16 

17 

18 

19 

- 

— 

— 

— 

Z 

— 

— 

— 

— 

— 

H 

G 

Z 

— 

— 

— 

— 

D 

D 

G 

- 

A 

R 

R 

R 

K 

R 

A 

A 

P 

P 

P 

R 

P 

A 

K 

D 

R 

T 

D 

S 

K 

Y 

F 

F 

F 

I 

F 

Y 

C 

C 

C 

C 

C 

C 

C 

K 

E 

E 

N 

R 

N 

K 

L 

L 

L' 

L 

L 

L 

L 

P 

P 

P 

P 

P 

P 

P 

L 

A 

A 

P 

P 

A 

V 

R 

E 

E 

E 

E 

E 

R 

I 

T 

T 

S 

Q 

T 

Y 

G 

G 

G 

G 

G 

G 

G 

P 

L 

L 

R 

P 

P 

P 

C 

C 

C 

C 

C 

C 

C 

K 

K 

K 

R 

K 

K 

K 

R 

A 

A 

G 

G 

A 

K 

K 

Y 

R 

H 

R 

S 

K 

I 

I 

I 

I 

L 

I 

F 

P 

R 

R 

R 

P 

R 

P 

S 

S 

S 

R 

R 

Q 

S 

F 

F 

F 

X 

Y 

Y 

F 

Y 

H 

H 

Y 

F 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

K 

N 

N 

N 

N 

N 

N 

W 

L 

R 

L 

P 

S 

W 

K 

A 

A 

E 

A 

K 

K 

A 

A 

A 

S 

S 

S 

A 

K 

Q 

Q 

N 

R 

G 

K 

Q 

Q 

Q 

K 

M 

G 

Q 

C 

C 

C 

C 

C 

C 

C 

L 

L 

L 

K 

E 

Q 

L 

P 

Q 

E 

V 

S 

Q 

P 

F 

F 

F 

F 

F 

F 

F 

D 

I 

I 

F 

I 

I 

N 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

S 

G 

G 

G 

G 

G 

S 

G 

G 

G 

G 

G 

G 

G 

C 

C 

C 

C 

C 

C 

C 

G 

G 

G 

G 

K 

R 

G 

G 

G 

G 

G 

G 

G 

G 

N 

N 

N 

N 

N 

N 

N 

A 

A 

A 

A 

K 

Q 

A 

N 

N 

N 

N 

N 

N 

N 
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Table 

13, 

continued. 








R # 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

44 

N 

N 

N 

N 

N 

N 

N 

N 

N 

N 

N 

R 

R 

R 

R 

N 

N 

R 

R 

45 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

46 

K 

K 

K 

E 

K 

K 

K 

K 

K 

K 

K 

K 

K 

K 

K 

E 

K 

D 

K 

47 

S 

S 

S 

T 

S 

S 

S 

S 

S 

S 

S 

T 

T 

T 

T 

T 

T 

T 

T 

48 

A 

A 

A 

T 

A 

A 

A 

A 

A 

A 

A 

I 

I 

I 

I 

R 

K 

T 

I 

49 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

D 

D 

D 

A 

Q 

E 

50 

D 

D 

D 

M 

D 

D 

D 

D 

D 

D 

D 

E 

E 

E 

E 

E 

E 

Q 

E 

51 

c 

c 

C 

c 

c 

c 

c 

c 

c 

C 

c 

C 

c 

C 

C 

c 

c 

C 

C 

52 

M 

M 

M 

L 

M 

M 

M 

M 

M 

M 

E 

R 

R 

R 

H 

R 

V 

Q 

R 

53 

R 

R 

R 

R 

R 

R 

R 

R 

R 

R 

R 

R 

R 

R 

R 

E 

R 

G 

R 

54 

T 

T 

T 

I 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

T 

A 

V 

T 

55 

c 

c 

c 

c 

c 

C 

C 

c 

c 

c 

c 

C 

c 

c 

c 

C 

c 

C 

c 

56 

G 

G 

G 

E 

G 

G 

G 

G 

G 

G 

G 

I 

V 

V 

V 

G 

R 

V 

V 

57 

G 

G 

G 

P 

G 

G 

G 

G 

G 

G 

G 

R 

G 

G 

G 

G 

P 

— 

G 

58 

A 

A 

A 

P 

A 

A 

A 

A 

A 

A 

A 

K 

— 

— 

— 

K 

P 

— 

- 

59 

— 

— 

— 

Q 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

E 

— 

— 

60 

— 

— 

— 

Q 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

R 

— 

— 

61 

- 

— 

— 

T 

— 

— 

— 

— 

- 

— 

— 

- 

- 

— 

- 

- 

P 

— 

- 

62 

— 

- 

— 

D 

- 

— 

— 

- 

— 

- 

- 

- 

— 

— 

— 

— 

- 

— 

- 

63 

— 

— 

— 

K 

- 

— 

— 

- 

— 


— 

— 

— 

— 

— 

— 

— 

— 

- 

64 
R # 

= residue 

S - 
number 
















1 BPTI 

2 Engineered BPTI From M&RK8 7 

3 Engineered BPTI From MARK87 

4 Bovine Colostrum (DUFT85) 

5 Bovine Serum (DUFT85) 

6 Semisynthetic BPTI, TSCH87 

7 Semisynthetic BPTI, TSCH87 

8 Semisynthetic BPTI, TSCH87 

9 Semisynthetic BPTI, TSCH87 

10 Semisynthetic BPTI, TSCH87 

11 Engineered BPTI, AUER87 

12 Dendroasois polvlepis polvlepis (Black mamba) venom I 
(DUFT85) 

13 Dendroasois polvlepis polvlepis (Black Mamba) venom K 
(DUFT85) 

14 Hemachatus hemachates (Ringhals Cobra) HHV II 
(DUFT85) 

15 Mai a nivea (Cape cobra) NNV II (DUFT85) 

16 Vipera russelli (Russel's viper) RW II (TAKA74) 

17 Red sea turtle egg white (DUFT85) 

18 Snail mucus ( Helix pomania ) (WAGN78) 

19 Dendroasois anausticeps (Eastern green mamba) 

C13 SI C3 toxin (DUFT85) 
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R # 

20 

21 

22 

23 

24 

Table 
25 26 

13 

27 

215 

, continued . 
28 29 30 31 

32 

33 

-5 

— 

— 

— 

— 

- 

— 

— 

- 

— 

— 

— 

— 

- 

D 

-4 

— 

— 

— 

- 

- 

— 

- 

— 

— 

— 

— 

— 

— 

E 

-3 

— 

— 

— 

— 

- 

- 

— 

— 

— 

— 

— 

— 

T 

P 

-2 

Z 

— 

L 

Z 

R 

K 

— 

- 


R 

R 

- 

E 

T 

-1 

P 

— 

Q 

D 

D 

N 

- 

- 

- 

Q 

K 

— 

R 

T 

1 

R 

R 

H 

H 

R 

R 

I 

K 

T 

R 

R 

R 

G 

D 

2 

R 

P 

R 

P 

P 

P 

N 

E 

V 

H 

H 

P 

F 

L 

3 

K 

Y 

T 

K 

K 

T 

G 

D 

A 

R 

P 

D 

L 

P 

4 

L 

A 

F 

F 

F 

F 

D 

S 

A 

D 

D 

F 

D 

I 

5 

c 

C 

c 

c 

C 

c 

c 

c 

c 

C 

c 

C 

C 

c 

6 

I 

E 

K 

Y 

Y 

N 

E 

Q 

N 

D 

D 

L 

T 

E 

7 

L 

L 

L 

L 

L 

L 

L 

L 

L 

K 

K 

E 

S 

Q 

8 

H 

I 

P 

P 

P 

L 

P 

G 

P 

P 

P 

P 

p 

A 

9 

R 

V 

A 

A 

A 

P 

K 

Y 

V 

P 

P 

P 

P 

FG 

10 

N 

A 

E 

D 

D 

E 

V 

s 

I 

D 

D 

Y 

V 

D 

11 

P 

A 

P 

P 

p 

T 

V 

A 

R 

K 

T 

T 

T 

A 

12 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

K 

G 

G 

G 

13 

R 

P 

P 

R 

R 

R 

p 

p 

p 

N 

X 

p 

P 

L 

14 

C 

C 

c 

C 

C 

C 

c 

c 

C 

C 

c 

c 

C 

c 

15 

Y 

M 

K 

K 

L 

N 

R 

M 

R 

— 

— 

K 

R 

F 

16 

D 

F 

A 

A 

A 

A 

A 

G 

A 

G 

Q 

A 

A 

G 

17 

K 

F 

S 

H 

Y 

Xj 

R 

M 

F. 

P 

T 

K 

G 

Y 

18 

I 

I 

I 

I 

M 

I 

F 

T 

I 

V 

V 

M 

F 

M 

19 

P 

S 

P 

P 

P 

P 

P 

S 

Q 

R 

R 

X 

K 

K 

20 

A 

A 

A 

R 

R 

A 

R 

R 

L 

A 

A 

R 

R 

L 

21 

F 

F 

F 

F 

F 

F 

Y 

Y 

w 

F 

F 

Y 

Y 

Y 

22 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

F 

A 

Y 

Y 

F 

N 

s 

23 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

F 

Y 

Y 

Y 

Y 

Y 

24 

N 

S 

N 

D 

N 

N 

N 

N 

D 

D 

K 

N 

N 

N 

25 

Q 

K 

w 

S 

p 

S 

s 

G 

A 

T 

P 

A 

T 

Q 

26 

K 

G 

A 

A 

A 

H 

s 

T 

V 

R 

s 

K 

R 

E 

27 

K 

A 

A 

S 

s 

L 

s 

S 

K 

L 

A 

A 

T 

T 

28 

K 

N 

K 

N 

N 

H 

K 

M 

G 

K 

K 

G 

K 

K 

29 

Q 

K 

K 

K 

K 

K 

R 

A 

K 

T 

R 

F 

Q 

N 

30 

c 

C 

C 

C 

c 

C 

C 

C 

c 

c 

c 

C 

c 

c 

31 

E 

Y 

Q 

N 

E 

Q 

E 

E 

V 

K 

V 

E 

E 

E 

32 

R 

P 

I. 

K 

K 

K 

K 

T 

L 

A 

Q 

T 

P 

E 

33 

p 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

34 

D 

rji 

H 

i 

I 

N 

i 

Q 

P 

Q 

R 

V 

K 

I 

35 

w 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

36 

S 

S 

G 

G 

G 

G 

G 

G 

G 

R 

G 

G 

G 

G 

37 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

38 

c 

C 

C 

C 

C 

C 

C 

C 

C 

C 

c 

c 

c 

C 

39 

6 

R 

K 

P 

R 

G 

G 

M 

Q 

D 

D 

K 

K 

Q 

40 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

G 

A 

G 

G 

41 

N 

N 

N 

N 

N 

N 

N 

N 

N 

D 

D 

K 

N 

N 

42 

s 

A 

A 

A 

A 

A 

A 

G 

G 

H 

H 

s 

G 

D 

43 

N 

N 

N 

N 

N 

N 

N 

N 

N 

G 

G 

N 

N 

N 
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Table 13, continued. 


R # 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

44 

R 

R 

R 

N 

N 

N 

N 

N 

K 

N 

N 

N 

R 

R 

45 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

F 

Y 

F 

46 - 

K 

K 

S 

K 

K 

K 

H 

V 

Y 

K 

K 

R 

K 

S 

47 

T 

T 

T 

T 

T 

T 

T 

T 

S 

T 

S 

S 

S 

T 

48 

I 

I 

I 

W 

W 

1 

L 

E 

E 

E 

D 

A 

E 

L 

49 

E 

E 

E 

D 

D 

D 

E 

K 

K 

T 

H 

E 

Q 

A 

50 

E 

E 

K 

E 

E 

E 

E 

E 

E 

L 

L 

D 

D 

E 

51 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

52 

R 

R 

R 

R 

R 

Q 

E 

L 

R 

R 

R 


L 

E 

53 

R 

R 

H 

Q 

H 

R 

K 

Q 

E 

C 

C 

R 

D 

Q 

54 

T 

T 

A 

T 

T 

T 

V 

T 

Y 

E 

E 

T 

A 

K 

55 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

C 

56 

X 

V 

V 

G 

V 

A 

G 

R 

G 

L 

E 

G 

S 

I 

57 

G 

V 

G 

A 

A 

A 

V 

— 

V 

V 

L 

G 

G 

N 

58 

— 

— 

— 

S 

S 

K 

R 

- 

P 

Y 

Y 

A 

F 

— 

59 

— 

- 

- 

A 

G 

Y 

S 

— 

G 

P 

R 

— 

— 

— 

60 

— 

— 

— 

— 

i 

G 

— 

— 

D 

— 

— 

— 

— 

— 


20 Dendroaspis ancrusticeps (Eastern Green 
Mamba) C13 S2 C3 toxin (DUFT85) 

21 Dendroaspis polvlepis polvlepes (Black 
mamba) B toxin (DUFT85) 

22 Dendroaspis polvlepis polvlepes (Black 
Mamba) E toxin (DUFT85) 

23 Vipera ammodvtes TX toxin (DUFT85) 

24 Vipera ammodvtes CTI toxin (DUFT85) 

25 Bunparus fasciatus VIII B toxin (DUFT85) 

26 Anemonia sulcata (sea anemone) 5 II 
(DUFT85) 

27 Homo sapiens HI-14 "inactive” domain 
(DDFT85) 

28 Homo sapiens HI— 14 "active" domain 
(DUFT85) 

29 beta bungarotoxin Bl (DUFT85) 

30 beta bungarotoxin B2 (DUFT85) 

31 Bovine spleen TI II (FIOR85) 

32 Tachypleus tridentatus (Horseshoe crab) 
hemocyte inhibitor (NAKA87) 

33 Bombvx mori (silkworm) sci-m (SASA84) 

Notes : 

a) both beta bungarotoxins have residue 15 deleted. 

b) B . mori has an extra residue between C5 and C14; we 
have assigned F and G to residue 9. 

c) all natural proteins have C at 5, 14, 30, 38, 50, & 55. 

d) all homologues have F33 and G37. 

e) extra C's in bungarotoxins form interchain cystine 
bridges 
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Table 14: Tally of lonizable Groups. 

BPTI homologues. 


Sequence 

Identifier 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 


D 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

0 

1 

2 

1 

2 

2 

1 

0 

2 

0 

0 

4 

3 

1 

1 

1 

2 

6 

6 

2 

3 

4 


E 

2 

2 

2 

4 

4 

2 

2 

2 

2 

2 

3 

3 

2 

3 

4 

5 

4 
1 
2 
3 

3 
2 
1 
2 
2 

5 

4 
3 
2 
2 
3 
3 
7 


K 

4 

4 

4 

2 

4 

3 

3 

3 

3 

3 

4 

7 

8 
2 
2 
3 
6 
2 
9 
6 

3 
6 

5 

4 

5 
4 
2 

4 

5 

6 
5 
5 
3 


R 

6 

6 

6 

3 

4 
6 
6 
6 
6 
6 
6 
7 

5 
5 
7 
7 
7 
4 

4 
7 

5 
3 
3 

6 

3 

4 
2 

3 
7 
7 

4 

5 
1 


Y 

4 

4 

4 

3 

4 
4 
4 
4 
4 
4 
4 

3 

4 
3 
2 
3 

3 

4 

4 
3 

5 

3 

4 

5 

3 

4 
4 

3 

4 
4 
4 
4 
4 


H 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

1 

2 

2 

0 

0 

0 

1 

0 

2 

2 

1 

1 

1 

0 

0 

2 

2 

0 

0 

0 


NH 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1. 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 


C02 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

xl 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 


+ 

6 

6 

6 

-1 

2 

5 

5 

5 

5 

5 

5 

11 

10 

2 

4 

3 

7 

4 

11 

8 

5 
7 

3 
5 
5 
2 

-1 

2 

4 

5 
4 
4 

-7 


# 

16 

16 

16 

13 
16 
15 
15 
15 
15 

15 
19 
19 
18 

14 

16 

19 
21 

8 

17 

20 
13 
13 

15 

17 

13 

16 
11 

14 
22 
23 
16 

18 
17 


Sequences given in Table 10. 


+ is sum ofK + R+ NH-D-E - C02, approximate charge on 
molecule at pH 7 . 0 

# is sum ofK+R + NH+D + E + C02, i.e. number of ionized 
groups at pH 7.0. 
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Table 15: Amino acids observed at each Residue 

BPTX homologues 

Number 

Different 


Res, # 

AAs 

Contents 

BPTX 

-5 

2 

D -32 

— 

-4 

2 

E -32 

— 

-3 

5 

T P F Z -29 

— 

-2 

10 

Z3 R3 Q2 T2 H G L K E -18 

— 


10 

D4 T2 P2 Q2 E G N K R -18 

— 

1 

10 

R21 A2K2H2PLITGD 

R 

2 

9 

P20 R4 A2 H2 N E V F L 

P 

3 

10 

D15 K6 T3 R2 P2 S Y G A L 

D 

4 

7 

F19 D4 L3 Y2 12 A2 S 

F 

5 

1 

C33 

C 

6 

10 

Lll E5 N4 K3 Q2 12 Y2 D2 T R 

L 

7 

5 

L18 Ell K2 S Q 

E 

8 

7 

P26 H2 A2 I L G F 

P 

9 

9 

P17 A6 V3 R2 Q L K Y F 

P 

10 

10 

Yll E7 D4 A2 N2 R2 V2 SID 

Y 

11 

10 

T17 P5 A3 R2 I S Q Y V K 

T 

12 

2 

G32 K 

G 

13 

5 

P22 R6 L3 N I 

P 

14 

3 

C31 T A 

C 

15 

12 

K15 R4 Y2 M2 L2 -2 V G A I N F 

K 

16 

7 

A22 G5 Q2 RK D F 

A 

17 

12 

R12 K5 A2 Y3 H2 S2 F2 L M T G P 

R 

18 

6 

121 M4 F3 L2 V2 T 

I 

19 

7 

111 P10 R6 S2 K2 L Q 

I 

20 

5 

R19 A7 S4 L2 Q 

R 

21 

4 

Y18 F13 W I 

Y 

22 

6 

F14 Y14 H2 A N S 

F 

23 

2 

Y32 F 

Y 

24 

4 

N26 K3 D3 S 

N 

25 

10 

A12 S5 Q3 P3 W3 L2 T2 K G R 

A 

26 

9 

K16 A6 T2 E2 S2 R2 G H V 

K 

27 

5 

A18 S8 K3 L2 T2 

A 

28 

7 

G13 K10 N5 Q2 R H M 

G 

29 

10 

L9 Q7 K7 A2 F2 R2 M G T N 

L 

30 

1 

C33 

C 

31 

7 

Q12 Ell L4 K2 V2 Y N 

Q 

32 

11 

T12 P5 K4 Q3 E2 L2 G V S R A 

T 

33 

1 

F33 

F 

34 

11 

Vll 18 T3 D2 N2 Q2 F H P R K 

V 

35 

2 

Y31 W2 

Y 

36 

3 

G27 S5 R 

G 

37 

1 

G33 

G 

38 

3 

C31 T A 

C 

39 

7 

R13 G9 K4 03 D2 P M 

R 
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Table 15 : continued . 


Res. # 

Number 

Different 

AAs 

Contents 

40 

2 

G22 

All 

41 

3 

N20 

Kll D2 

42 

9 

All 

R9 S4 G3 H2 D Q K N 

43 

2 

N31 

G2 

44 

3 

N21 

Rll K 

45 

2 

F32 

Y 

46 

8 

K24 

E2 S2 D H V Y R 

47 

2 

T19 

S14 

48 

9 

All 

19 E4 T2 W2 L2 R K D 

49 

7 

E19 

D6 A2 Q2 K2 T H 

50 

6 

E16 

D12 L2MQK 

51 

1 

C33 


52 

7 

R13 

M10 L3 E3 Q2 H V 

53 

8 

R21 

Q3 E2 H2 C2 G K D 

54 

7 

T23 

A3 V2 E2 I y K 

55 

1 

C33 


56 

8 

G15 

V8 13 E2 R2 A L S 

57 

8 

G19 

V4 A3 P2 -2 R L N 

58 

8 

All 

-10 P3 K3 S2 Y2 R P 

59 

9 

-24 

G2QEAYSPR 

60 

6 

-28 

Q R I G D 

' 61 

3 

-31 

T P 

62 

2 

-32 

D 

63 

2 

-32 

K 

64 

2 

-32 

S 
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Table 16: Exposure in BPTI 
Coordinates taken from 

Brookhaven Protein Data Bank entry 6PTI. 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13 -MAY-87 

COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR 

COMPND 2 (/BPTI$ , CRYSTAL FORM /III$) 

AUTHOR A . WLODAWER 


6PTI 


Solvent radius = 1.40 

Atomic radii given in Table 7 

Areas in Angstroms-squared. 


Total 
Residue area 


Not Not 

Covered covered 

by M/c fraction at all fraction 


ARG 1 
PRO 2 
ASP 3 
PHE 4 
CYS 5 
LEU 6 
GLU 7 
PRO 8 
PRO 9 
TYR 10 
THR 11 
GLY 12 
PRO 13 
CYS 14 
LYS 15 
ALA 16 
ARG 17 
ILE 18 
ILE 19 
ARG 20 
TYR 21 
PHE 22 
TYR 23 
ASN 24 
ALA 25 
LYS 26 
ALA 27 
GLY 28 
LEU 29 
CYS 30 
GLN 31 
THR 32 


342.45 

205.09 

239.12 

92.65 

272.39 

158.77 

311.33 

137.82 

241.06 

48.36 

280.98 

151.45 

291.39 

128.91 

236.12 

128.71 

236.09 

109.82 

330.97 

153 . 63 

249.20 

80.10 

184.21 

56.75 

240.07 

130.25 

237.10 

75.55 

310.77 

200.25 

209.41 

66. 63 

351.09 

243.67 

277.10 

100.51 

278.03 

146.06 

339.11 

144.65 

333.60 

102.24 

306.08 

70.64 

338.66 

77.05 

264.88 

99.03 

211.15 

85.13 

313.29 

216.14 

210 .66 

96.05 

186.83 

71.52 

280.70 

132.42 

238.15 

57.27 

301.15 

141.80 

251.26 

138.17 


0.5989 

152.49 

0.3875 

47.56 

0.5829 

143.23 

0.4427 

43.21 

0.2006 

0.23 

0.5390 

115.87 

0.4424 

90.39 

0.5451 

99.98 

0.4652 

45.80 

0.4642 

79.49 

0.3214 

64.99 

0.3081 

23.05 

0.5426 

75.27 

0.3186 

53.52 

0 . 6444 

192.00 

0.3182 

45.59 

0.6940 

201.48 

0.3627 

58.95 

0.5254 

96.05 

0.4266 

43.81 

0.3065 

69.67 

0.2308 

23.01 

0.2275 

17.34 

0.3739 

38.69 

0.4032 

48.20 

0.6899 

202.84 

0.4560 

54.78 

0.3828 

32.09 

0.4718 

93.61 

0.2405 

19.33 

0.4709 

82.64 

0.5499 

76.47 


0.4453 
0.1989 
0.5258 
0.1388 
0.0010 
0.4124 
0.3102 
0.4234 
0.1940 
0.2402 
0.2608 
0.1252 
0.3136 
0.2257 
0.6178 
0.2177 
0.5739 
0.2127 
0.3455 
0.1292 
0.2089 
0 . 0752 
0.0512 
0.1461 
0.2283 
0.6474 
0.2601 
0.1718 
0.3335 
0.0812 
0.2744 
0.3043 
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Table 16, continued. 


PHE 

33 

VAL 

34 

TYR 

35 

GLY 

36 

GLY 

37 

CYS 

38 

ARG 

39 

ALA 

40 

LYS 

41 

ARG 

42 

ASN 

43 

ASN 

44 

PHE 

45 

LYS 

46 

SER 

47 

ALA 

48 

GLU 

49 

ASP 

50 

CYS 

51 

MET 

52 

ARG 

53 

THR 

54 

CYS 

55 

GLY 

56 

GLY 

57 

ALA 

58 


304.27 

251.56 

332.64 

187.06 

185.28 

234.56 
417.13 

209.53 
314.60 

349. 06 
266.47 

269.65 
313.22 
309.83 

224.78 
211.01 
286.62 

299.53 
238.68 
293.05 
356.20 

251.53 
240.40 

184.66 
106.58 


59.79 

109.78 

80.52 
11.90 
84.26 
73.64 

304.62 

94.01 

166.23 

232.83 

38.53 
91.08 

69.73 

217.18 

69.11 

82.06 

161.00 

156.42 
24.51 
89.48 

224.61 

116.43 

69.95 

60.79 
49.71 


0.1965 

0.4364 

0.2421 

0.0636 

0.4548 

0.3139 

0.7303 

0.4487 

0.5284 

0.6670 

0.1446 

0.3378 

0.2226 

0.7010 

0.3075 

0.3889 

0.5617 

0.5222 

0.1027 

0.3054 

0.6306 

0.4629 

0.2910 

0.3292 

0.4664 


18.91 

42.36 

15.05 

1.97 

39.17 

26.40 

250.73 

52.95 
108.77 
179.59 

5.32 

23.39 

14 . 79 

155.73 

24.80 

31.07 

100.01 

95.96 

0.00 

66.70 

189.75 

51.64 

0.00 

32.78 

38.28 


0.0622 

0.1684 

0.0452 

0.0105 

0.2114 

0.1125 

0.6011 

0.2527 

0.3457 

0.5145 

0.0200 

0.0867 

0.0472 

0.5026 

0.1103 

0.1473 

0.3489 

0.3204 

0.0000 

0.2276 

0.5327 

0.2053 

0.0000 

0.1775 

0.3592 


no position given in Protein Data Bank 


'•Total area" 


"Not covered 
by M/C" 


"Not covered 
at all" 


is the area measured by a rolling sphere 
of radius 1.4 A, where only the atoms 
within the residue are considered. This 
takes account of conformation. 

is the area measured by a rolling sphere 
of radius 1.4 A where all main— chain atoms 
are considered, fraction is the exposed 
area divided by the total area. Surface 
buried by main-chain atoms is more 
definitely covered than is surface covered 
by side group atoms. 

is the area measured by a rolling sphere 
of radius 1.4 A where all atoms of the 
protein are considered. 
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Table 17: Plasmids used in Detailed Example 


Phacre 

Contents 

LG1 

M13mpl8 with Ava II/Aat XX/Acc I/Rsr 
II/Sau I adaptor 

pLG2 

LG1 with amp** and ColEl of pBR322 cloned 
into Aat II/Acc I sites 

pLG3 

pLG2 with Acc X site removed 

pL64 

pLG3 with first part of OSO-Obd qsns 
cloned into Rsr II/Sau I sites, 
II/Asu II sites created 

pLG5 

pLG4 with second part of oso-nbd q-ene 

cloned into Avr II/Asu II sites, BssH I 
site created 

pLG6 

with third part of osp— pbd qene 

cloned into Asu II/BssH I sites, Bbe I 
site created 

pLG7 

pLG6 with last part of osn-obd crene 
cloned into Bbe I/Asu IT si+-ee 

pLG8 

pLG7 with disabled osp— pbd crene* r 
length DNA. 

pLG9 

pLG7 mutated to display BPTI(V15 BPTI ) 

pLGlO 

pLG8 + tet R gene - amp R qene 

plaGll 

pLG9 + tet R gene — amp R gene* 
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Table 25: Annotated Sequence of ipbd gene 


5 ' — c|gga|ccg|tat| CCA I GGC I TTT I ACA I CTT|TAT| 

28 

| Rsr XI I [ -35 | 

| GCT | TCC | GGC | TCG | TAT | AAT | GTG | TGG | 52 

J =12 L 

|AAT|TGT|GAG|CGG|ATA|ACA|ATT| 73 

| lac operator |_ 


| CCT | AGG | AGG | CTC | ACT | 88 

| Avr II | 

| S. D. | 

9 

|m|k|k|s|l|v|l|k|a[s| 

|1|2|3|4|5|6|7|8|9|10| 

| ATG | AAG | AAA | TCT | CTG | GTT | CTT | AAG | GCT | AGC | 118 

| Afl IX [ Nhe I I 

# 

|v|a|v|a|t|l|v|p|m[l| 

I 11 1 12 I 13) 14 | 15 1 16 1 17 1 18 1 19 1 20 1 

| GTT | GCT | GTC j GCG | ACC | CTG | GTA | CCG | ATG | CTG f 14 8 

1 Nru I | | Kpn I | 

I s | f | a | r | p | d | f | c | 1 | e | 

| 21 | 22 | 23 | 24 | 25| 26| 27 | 28 | 29| 30| 

| TCT | TTT | GCT f CGT | CCG | GAT | TTC | TGT | CTC | GAG | 17 8 
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Table 25, continued. 
jAccIlI | 1 Ava T | 

1 Xho X | 

I p I p I y I t I g I p | C I k I a I r I 

| 31 1 32 1 33 1 34 1 35| 36| 37| 38| 39| 40 1 
f CCG | CCA J TAT | ACT J GGG | CCC f TGC | AAA | GCG | CGC | 

-I miSLI I. tBsaff TT | 

I Apa I | 

I Dra TT | 

I Pss T | 

l i l i l r iy|fjyjn{a|k| 

| 41 | 42 | 43 | 44 | 45 | 46| 47 | 48| 49| 

| ATC | ATC | CGT | TAT [ TTC | TAC | AAC | GCT | AAA | 


208 


235 
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Table 25, continued. 

|a|g|l|c|q|t|f|v|y|g|g| 
| 50 | 51 | 52 | 53 ] 54 | 55 | 56 | 57 | 58 | 59 | 60 | 
| GCA | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TAC | GGT | GGT | 
| Stu I | | Acc 1 | 

| Xca I | 


|c|r|a|k|r|n|n|f|k| 
I 61 1 62 1 63 I 64 1 65 1 66 1 67 1 68 1 69 1 

| TGC | CGT | GCT j AAG | CGT | AAC | AAC | TTT | AAA | 

I ESP I I 


| s | a | e | d | c | i | r | t | c | g | 
| 70 1 71 j 72 | 73 | 74 1* 75 1 76| 77 1 78 1 79 | 
| TCG | GCC | GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 
| XmallX | | Sph X | 


|g|a|a|e|g|d|d| 
| 80 | 81 | 82 | 83 | 84 | 85| 86| 
| GGC | GCC | GCT | GAA | GGT | GAT | GAT | 
| Bbe X | 

| Nar X | 


| p | a | k | a | a | 
| 87 1 88 j 89 1 90 1 9l| 
| CCG | GCC | AAA | GCG | GCC | 
| Sfi I [ 


268 


295 


325 


346 


361 
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|f|n|s|l|q|a|s|a|t| 

Table 25, continued, 
j 92 | 93 | 94 | 95 [ 96 | 97 | 98 | 99|l00| 
| TTT | AAC J TCT | CTG | CAA | GCT | TCT | GCT \ ACC | 

| Hind 3 | 


|e|y|i|g|y|a[w| 
1 101 1 102 1 103 1 104 1 105 1 106 1 107 | 
| GAA | TAT | ATC | GGT | TAC | GCG | TGG j 

[ Mlu I | 


| a | m | v | v ( v | 
1 108 1 109 1 110 j 111 1 112 | 
| GCC | ATG | GTG | GTG | GTT | 

| BstX I |_ 

| Nco I { 


388 


409 


424 
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Table 25, continued. 

|i|v|g|a|t|i|g|i| 

1 113 1 114 1 115 1 116 1 117 1 118 1 119 1 120 | 

| ATC | GTT | GGT J GCT | ACC | ATC | GGT | ATC | 


I k | 1 | f | k | k | f | t | s | k | a | 
1 121 1 122 1 123 | 124 1 125 1 126 J 127 1 128 1 129 1 130 | 
| AAA j CTG | TTT | AAG | AAA | TTT | ACT | TCG | AAA | GCG | 

| Asu II | 


| 131 1 132 | 133 | 134 | 

| TCT | TAA | TAG | TGA| GGT | TAC | CAG | TCT | 

I BstE IT I 

' 4 


| AAG | CCC | GCC | TAA | TGA | GCG | GGC | TTT | TTT | TTT | 
| Trc terminator [ 


• |CCT|GAG|G -3' 
[ Sau I | 


Note the following enzyme equivalences. 


Xma III 
Acc III 
Dra II 
Asu II 
Sau I 


Eaa I 
BspM II 
ECOQ109 I 
BstB I 
BSU36 I 


448 


478 


502 


532 


539 
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Table 27: DNA_synth.l 


5 ' | CCG | TCC | GTC | GGA | CCG | TAT | CCA [ GGC { TTT | ACA | CTT | TAT | 


| GCT | TCC [ GGC | TCG | TAT | AAT | GTG | TGG 


| AAT | TGT | GAG | CGG | ATA | ACA | ATT | 

olig#4 = 3 ' - gt taa 


[ CCT | AGG | 
gga tcc 


/ 3' = olig#3 
| GCC | GCT I' CCT | TCG | A AA | GCG | 
egg ega gga age ttt ege 


| TCT | TAA [ TAG | TGA | GGT j TAC | CAG | TCT | 

aga att ate act cca atg gtc aga 

I AAG | CCC I GCC I TAA | TGA [ GCG | GGC | TTT | TTT | TTT | 

ttc ggg egg att act ege ccg aaa aaa aaa 

| CCT | GAG | GCA | GGT | GAG [ CG 

gga etc cgt cca etc gc - 5 ' 
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Table 27, continued. 


"Top" strand 

99 

"Bottom" strand 

100 

Overlap 

23 (14 c/g and 9 a/t) 

Net length 

158 
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Table 28: DNA seq2 


5'- | gca | cca | acg | 
j spacer [ 


| CCT | AG6 1 AGG | CTC | ACT | 
| Avr XX | 

| S. D. | 


|m|k{k|s|l(v(l|k|a|s[ 
|1|2|3|4|5|6|7|8|9|10| 
| ATG | AAG | AAA | TCT | CTG [ GTT | CTT | AAG | GCT | AGC | 

| Afl XI | Nhe I | 


1 V | a I v I a I t I 1 I v I p I b I 1 I 
| 11 1 12 1 13 | 14 | 15 1 16 1 17 j 18 J 19 | 20 1 
| GTT | GCT | GTC | GCG | ACC | CTG | GTA | CCG | ATG | CTG | 

| Hru I | | Kpn X| 


|s|f|a|r|pld|f|c|l|e 

| 21 | 22 | 23 | 24 | 25| 26| 27 | 28 | 29 | 30 

| TCT | TTT [ GCT [ CGT j CCG j GAT | TTC | TGT | CTC j GAG 

\ AccIIX | | Ava X 

| Xho I 

|p|p|y|t|g|p|c|k|a|r 
| 31 1 32 1 33 1 34 1 35j 36| 37 J 38 1 39| 40 
| CCG | CCA ( TAT | ACT | GGG | CCC | TGC | AAA | GCG | CGC 
| PflM 1 l | BssH IX 
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Table 28, continued. 

1-Ana I | 

I Dra XX | 

| Pss T | 


I i I i I r | 
| 41 1 42 j 43 | 
| ate | ate | cgt | 


| t | s | k | 

1 127 1 128 1 129 j 

| ACT | TCG j AAa | geg | get | geg | - 
I Asu IX | spacer I 


3 1 
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Table 30: DNA seq3 



I a I * I 

| 39 | 40 | 
| ccc [ tgc | aca | GC6 1 CGC | 
| spacer ] BssH II | 


|i|i|r|7|f|y|n|a|k| | 

| 41 | 42 [ 43 | 44 | 45 | 46| 47 | 48 | 49 | 

| ATC j ATC | CGT | TAT | TTC | TAC | AAC | GCT | AAA | 


|a|g|l|c|q|t|f|v|y|g|g| 
| 50 | 51 | 52 | 53 | 54 [ 55 | 56 j 57 | 58 | 59 j 60 j 
| GCA | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TAC | GGT | GGT j 
| Stu 1 1 ( ACC I | 

| Xca X | 


I C | r I a I k I r I n I n I f I k I 

| 61] 62] 63] 64 | 65] 66] 67 | 68] 69] 

| TGC | CGT | GCT ] AAG | CGT ] AAC ] AAC | TTT | AAA | 

| ESP X | 

|sjaje|d|c]ni|r|t|c|g| 

I 70 1 71 [ 72 1 73 1 74 1 75 1 76 1 77] 78] 79] 
| TCG | GCC | GAA | GAT ] TGC ) ATG | CGT | ACC | TGC ] GGT | 
| XmallX | f Sob T | 


I <3 I a | 

| 80] 81 | 
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Table 30, continued. 


I GGC j GCC I get j gaa | 
| Bbe X | spacer 
| Nar X | 


| t | s | k | 

| 127 | 128 | 129 | 

| ttt | acT | TCG J AAa | geg | teg | ccg | - 3 7 
[Asu XX [ 
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Table 32: DNA_seq4 


|g|a|a|e|g|d|d| 

5 ' I 80 | 81 | 82 | 83 [ 84 | 85| 86| 

| cct I cgc | cct [ GGC I GCC | GCT | GAA | GGT | GAT [ GAT | 
| spacer | Bbe I | 

I Nar I | 


I P | a | k | a | a | 
j 87 1 88 1 89 1 90 1 91 1 

| CCG | GCC | AAA | GCG | GCC | 

I Sfi I | 


l f |n|s|l|q| a |s|a|t[ 

| 92 | 93 | 94 J 95 | 96 | 97 [ 98 | 99|l00| 
| TTT | AAC | TCT [ CTG | CAA | GCT | TCT | GCT | ACC | 

[Hind 3 | 


|e|y|i|g|y| a jw| 

1 101 1 102 1 103 | 104 1 105 1 106 1 107 | 
f GAA | TAT [ ATC | GGT | TAC | GCG | TGG | 

I Mlu I[ 


| a | m | v | v | v | 
( 108 1 109 1 110 | 111 1 112 | 
| GCC | ATG | GTG | GTG | GTT | 

I BstX T [_ 

I Hco T | 
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Table 32, continued. 
|i|v|g|a|t|i|g|i| 

I 113 1 114 1 115 1 116 | 117 1 118 | 119 | 120 | 

| ATC | GTT | GGT | GCT | ACC j ATC | GGT | ATC | 


|k|l|f|k|k|f|t|s|k| 

| 121 1 122 | 123 1 124 | 125 1 126 | 127 1 128 | 129 | 

| AAA | CTG | TTT | AAG | AAA | TTT | ACT | TCG | AAa | gcg | teg | ggc | - 3 ' 

| Asu II | spacer |_ 
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Table 34: Some interaction sets in BPTI 


Number 

Res. Diff . . ■ 


£ AAs Contents BPTI 1 2 3 4 5 


-5 

2 

D -32 



— 





-4 

2 

E -32 



— 





-3 

5 

T P F Z -29 



— 





-2 

10 

Z3 R3 Q2 T2 : 

H G L K E -18 

- 





-1 

10 

D4 T2 P2 Q2 

E G N K R -18 

— 





1 

10 

R21 A2 K2 H2 

P L I 

T G D 

R 




5 

2 

9 

P20 R4 A2 H2 

N E V 

F L 

P 



s 

5 

3 

10 

D15 K6 T3 R2 

P2 S Y G A L 

D 



4 

s 

4 

7 

F19 D4 L3 Y2 

12 A2 

S 

F 



s 

5 

5 

1 

C33 



C 



X 

X 

6 

10 

Iill E5 N4 K3 

Q2 12 

Y2 D2 T R 

L 



4 


7 

5 

LI 8 Ell K2 S 

Q 


E 


s 

4 


8 

7 

P26 H2 A2 I 

L G F 


P 


3 

4 


9 

9 

P17 A6 V3 R2 

Q L K 

Y F 

P 

s 

3 

4 


10 

10 

Yll E7 D4 A2 

N2 R2 

V2 S I D 

Y 

s 

s 

4 


11 

10 

T17 P5 A3 R2 

I S Q 

Y V K 

T 

1 s 

3 

4 


12 

2 

632 K 



G 

X 

X 

X 
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Table 34, continued. 


13 

5 

P22 

R6 L3 N I 

P 

1 

s 4 s 

14 

3 

C31 

T A 

C 

1 

s s 5 

15 

12 

K15 

R4 Y2 M2 L2 -2 V G A I N F 

K 

1 

s 3 4 s 

16 

7 

A22 

G5 Q2 R K D F 

A 

1 

s s s 5 

17 

12 

R12 

K5 A2 Y3 H2 S2 F2 L M T G P 

R 

1 

2 3 s 

18 

6 

121 

M4 F3 L2 V2 T 

I 

1 

s s 5 

19 

7 

111 

P10 R6 S2 K2 L Q 

I 

1 

2 3 S 

20 

5 

R19 

A7 S4 L2 Q 

R 

s 

s s 5 

21 

4 

Y18 

F13 W I 

Y 


2 s s s 

22 

6 

F14 

Y14 H2 A N S 

F 


s 3 4 

23 

2 

Y32 

F 

Y 


s s 

24 

4 

N26 

K3 D3 S 

N 


S 3 

25 

10 

A12 

S5 Q3 P3 W3 L2 T2 K G R 

A 


s s 

26 

9 

K16 

A6 T2 E2 S2 R2 G H V 

K 


S 3 4 

27 

5 

A18 

S8 K3 L2 T2 

A 


2 3 4 

28 

7 

613 

K10 N5 Q2 R H M 

G 


2 S S 

29 

10 

L9 Q7 K7 A2 F2 R2 M G T N 

L 


2 3 

30 

1 

C33 


C 

X 

: x x 

31 

7 

Q12 

Ell L4 K2 V2 Y N 

Q 

2 

3 4 

32 

11 

T12 

P5 K4 Q3 E2 L2 G V S R A 

T 

2 

3 s 

33 

1 

F33 


F 

X X 

: x x 

34 

11 

Vll 

18 T3 D2 N2 Q2 F H P R K 

V 

1 2 

3 s 

35 

2 

Y31 

W2 

Y 

s s 

s 5 

36 

3 

G27 

S5 R 

G 

1 


37 

1 

G33 


G 

X 

X 

38 

3 

C31 

T A 

C 

1 

s 5 

39 

7 

R13 

G9 K4 Q3 D2 P M 

R 

1 

4 s 
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Table 34, continued. 


40 

2 

622 

All 

A 

S S 

5 

41 

3 

N20 

Kll D2 

K 

4 

s 

42 

9 

All 

R9 S4 63 H2 D Q K N 

R 

s 

5 

43 

• 

2 

N31 

62 

N 


s 

44 

3 

N21 

Rll K 

N 


s 

45 

2 

F32 

Y 

F 


s 

46 

8 

K24 

E2 S2 D H V Y R 

K 


5 

47 

2 

T19 

S14 

S 

s 

5 

48 

9 

All 

19 E4 T2 W2 L2 R K D 

A 

2 S 

s 

49 

7 

E19 

D6 A2 Q2 K2 T H 

E 

2 

s 

50 

6 

E16 

D12 L2HQK 

D 

s 

5 

51 

1 

C33 


C 

X 

X 

52 

7 

R13 

M10 L3 E3 Q2 H V 

M 

2 

s 

53 

8 

R21 

Q3 E2 H2 C2 6 K D 

R 

s 

5 

54 

7 

T23 

A3 V2 E2 I Y K 

* 

T 


5 

55 

1 

C33 


C 


X 

56 

8 

615 

V8 13 E2 R2 A L S 

G 



57 

8 

619 

V4 A3 P2 -2 R L N 

G 



58 

8 

All 

-10 P3 K3 S2 Y2 R F 

A 



59 

9 

-24 

62QEAYSPR 

— 



60 

6 

-28 

Q R I 6 D 

— 



61 

3 

-31 

T P 

- 



62 

2 

-32 

D 

— 



63 

2 

-32 

K 

- 



64 

2 

-32 

S 

mm 




s indicates secondary set 

indicates in or close to surface but buried and/or highly 
conserved. 


x 
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Table 35: 

Distances from to 

Tip of Side Group 
in Angstroms 


Amino Acid type 

Distance 

A 

o 

• 

o 

C (reduced) 

1.8 

D 

2.4 

E 

3.5 

F 

4.3 

G 

— 

H 

4 . 0 

I 

2 . 5 

K 

5.1 

L 

2.6 

M 

3.8 

N 

. 

CM 

P 

2.4 

Q 

3.5 

R 

6.0 

S 

1.5 

T 

1.5 

V 

1.5 

W 

5.3 

Y 

5.7 


Notes: These distances were calculated for standard model parts 

with all side groups fully extended. 
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Table 36: Distances, BPTX residue set #2 
Distances in Angstroms between Cbeta s - 
Hypothetical Cfc>eta was added to each Glycine. 



R17 

119 

Y21 

A27 

G28 

L29 

Q31 

T32 

V34 

A48 

119 

7.7 










Y21 

15.1 

8.4 









A27 

22.6 

17.1 

12.2 








G28 

26.6 

20.4 

13.8 

5.3 







L29 

22.5 

15.8 

9.6 

5.1 

5.2 






Q31 

16.1 

10.4 

6.8 

6.8 

10.6 

6.8 





T32 

11.7 

5.2 

6.1 

12.0 

15.5 

10.9 

5.4 




V34 

5.6 

6.5 

11.6 

17.6 

21.7 

18.0 

11.4 

8.2 



A48 

18.5 

11.0 

5.4 

12.6 

13.3 

8.4 

8.8 

8.3 

15.7 


E49 

22.0 

14.7 

8.9 

16.9 

16.1 

12.2 

13.9 

13.3 

19.8 

5.5 

M52 

23 . 6 

16.3 

8.6 

12.2 

10.3 

7.6 

11.3 

13.2 

20.0 

6.2 

P9 

14.0 

11.3 

9.0 

12.2 

15.4 

13.3 

7.9 

9.2 

8.7 

13.9 

Til 

9.5 

11.2 

13.5 

18.8 

22.5 

19.8 

13.5 

12.1 

5.7 

18.5 

K15 

7.9 

14.6 

20.1 

27.4 

31.3 

27.9 

21.4 

18.1 

10.3 

24.6 

A16 

5.5 

10.1 

15.9 

25.2 

28.5 

24.6 

18.6 

14.5 

8.6 

19.8 

118 

6.1 

6.0 

11.2 

21.3 

24.4 

20.2 

14.7 

10.4 

7.0 

15.0 

R20 

10.6 

5.9 

5.4 

16.0 

18.5 

14.6 

9.8 

6.9 

7.8 

10.2 

F22 

15.6 

10.9 

5.6 

10.5 

12.8 

10.3 

6.2 

8.1 

10.8 

10.3 

N24 

19.9 

14.7 

9.4 

4.1 

7.3 

6.1 

4.8 

10.0 

14.7 

11.4 

K26 

24.4 

20.1 

15.2 

5.4 

7.7 

9.8 

10.1 

15.3 

19.0 

17.0 

C30 

18.9 

12.1 

4.6 

8.8 

9.5 

5.3 

5.9 

8.2 

14.9 

4.9 

F33 

10.8 

7.4 

7.7 

12.6 

16.4 

13.0 

6.6 

5.6 

5.5 

12.2 

Y35 

8.4 

7.4 

9.4 

18.4 

21.4 

17.9 

12.2 

9.5 

5.8 

14.4 

S47 

17.6 

10.6 

6.6 

17.3 

17.9 

13.4 

12.6 

10.4 

15.9 

5.3 

D50 

20.0 

13.6 

7.2 

17.2 

16.8 

13.5 

13.5 

12.9 

17.6 

7.6 

C51 

18.9 

12.2 

4.0 

12.1 

12.2 

8.8 

8.8 

9.7 

15.3 

5.4 

R53 

25.4 

18.6 

11.0 

17.2 

15.0 

13.0 

15.7 

16.7 

22.3 

9.7 

R39 

15.4 

16.9 

17.1 

24.9 

27.2 

24.9 

20.1 

18.7 

13.8 

22.3 
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Table 36, continued. 


Distances in Angstroms between Cj 3e -^ a s. 
Hypothetical C^g^-a was added to each Glycine. 



E49 

M52 

P9 

Til 

K15 

A16 

X18 

R20 

F22 

N24 

M52 

6.1 










P9 

17.7 

15.5 









Til 

22.1 

21.5 

7.2 








K15 

27.5 

28.7 

16.4 

9.5 







A16 

22.2 

24.2 

14.9 

9.8 

6.2 






118 

17.4 

19.5 

12.2 

9.5 

10.4 

4.9 





R20 

13.0 

13.8 

8.0 

9.4 

14.9 

10.6 

6.2 




F22 

13.8 

11.4 

4.1 

10.6 

19.1 

16.3 

12.7 

6.9 



N24 

15.6 

11.2 

8.4 

15.3 

24.1 

21.9 

18.2 

12.7 

6.6 


K26 

20.9 

15.7 

12.1 

18.6 

27.9 

26.6 

23.3 

18.1 

11.6 

5.9 

C30 

8.7 

5.6 

10.6 

16.6 

24.1 

20.2 

15.7 

9.8 

6.8 

6.9 

F33 

16.5 

15.4 

4.2 

7.1 

15.0 

12.8 

9.6 

6.1 

5.6 

9.3 

Y35 

17.2 

17.8 

7.8 

5.8 

11.0 

7.6 

4.9 

4.3 

8.8 

14.8 

S47 

4.7 

9.1 

15.3 

18.5 

23.1 

17.6 

12.8 

9.1 

12.0 

15.3 

D50 

5.5 

7.7 

14.7 

18.6 

24.2 

19.2 

14.7 

9.9 

11.0 

14.7 

C51 

7.1 

5.4 

11.0 

16.4 

23.5 

19.2 

14.6 

8.7 

6.9 

9.6 

R53 

6.3 

5.6 

17.9 

23.1 

29.6 

24.8 

20.3 

15.0 

13 . 8 

15.5 

R39 

23 . 9 

24.0 

13.0 

9.5 

12.0 

11.8 

12.5 

12.8 

14.7 

20.8 


K26 

C30 

F33 

Y35 

S47 

D50 

C51 

R53 



C30 

12.4 










F33 

13.9 

10.1 









Y35 

19.5 

13.5 

6.4 








S47 

21.0 

8.8 

13.5 

13.2 







D50 

20.1 

8.6 

14.3 

13.7 

5.0 






C51 

15.0 

3.7 

10.9 

12.5 

6.9 

5.2 





R53 

19.9 

9.9 

18.2 

18.8 

9.4 

5.8 

7.4 




R39 

24.3 

20.6 

14.4 

9.6 

20.4 

19 .0 

18.8 

23.4 
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Table 37: vgDNA to vary BPTX set #2.1 

+ 






g 

P 

C 

k 

a 

X 






35 

36 

37 

38 

39 

40 



5'- 

CAC f CCT 

GGG 

CCC 

TGC 

AAA 

GCG 

ofk 

208 



spacer 

Ana X 







+ 











i 

X 

r 

y 

f 

y 

n 

a 

k 



41 

42 

43 

44 

45 

46 

47 

48 

49 



AT C 

afk 

CGT 

TAT 

TTC 

TAC 

AAC 

GCT 

AAA 


235 


/ 3' — olig#27 72 nts 

+ 


X 

g 

X 

c 

g 

t 

■ 

f 

50 

51 

52 

53 

54 

55 

56 

afk 

^ • i 

GGt 

It 

afk 

TGC 

CAG 

ACC 

TTc 


78 nts 


X 

57 

qfk 


y 

58 

TAC 


g 

59 

GGT 


g 

60 

GGT 


268 


Overlap =12 (7 CG, 5 AT) 


C 

r 

a 

k 

r 

n 

n 

f 

k 


61 

62 

63 

64 

65 

66 

67 

68 

69 


TGC 

CGT 

GCT 

AAG 

CGT 

AAC 

AAC 

TTT 

AAA 

295 


acg gca cga ttc gca ttg ttg aaa ttt 

| Eso X ( 


+ 


s 

X 

e 

d 

c 

m 



70 

71 

72 

73 

74 

75 


- 

TCT 

qfk 

GAG 

GAT 

TGC 

ATG 

C 

322 


age **m etc eta acg tac gca ccc ace -5' 

[ Sph. X | spacer | 

k = equal parts of T and G; m = equal parts of C and A; 
q (.26 T , .18 C| .26 A^ and .30 G ) $ 

f = (.22 T, .16 C, .40 A, and .22 G) ; 

* = complement of symbol above 

Residue 40 42 50 52 57 71 

Possibilities 21 x 21 x 21 x 21 x 21 x 21 = 8.6 x 10 7 

Abundance x 10: 

of PPBD .768 .271 .459 .671 .600 .459 

Produce = 1.77 x 10“ 8 

Parent = l/(5.5 x 10 7 ) least favored = 1/(4. 2 x 10 9 ) 

Least favored one— amino— acid substitution from PPBD present 
at 1 in 1.6 x 10 7 
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Table 38: Result of varying set#2 of BPTI 2 


1 

e 

29 

30 

CTC 

GAG 

Ava I 

Xho X 


P 

P 

y 

t 

g 

P 

c 

k 

a 

D 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

CCG 

CCA 

P: 

TAT 
:1M ! 

ACT 

_ 

GGG 

CCC 

TGC 

AAA 

GCG 

GAT 


Apa I 


Dra IT 
Pss T 


A 


♦ 


i 

Q 

r 

y 

f 

y 

n 

a 

k 



41 

42 

43 

44 

45 

46 

47 

48 

49 



ATC 

CAG 

CGT 

TAT 

TTC 

TAC 

AAC 

GCT 

AAA 



E 

g 

L 

C 

g 

t 

f 

s 

y 

g 

g 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

GAG 

GGC 

CTG 

TGC 

CAG 

ACC 

TTT 

TCG 

TAC 

GGT 

GGT 

C 

r 

a 

k 

r 

n 

n 

f 

k 



61 

62 

63 

64 

65 

66 

67 

68 

69 



TGC 

CGT 

GCT 

AAG 

CGT 

AAC 

AAC 

TTT 

AAA 




_ 

ESP I 

± 








S 

w 

e 

d 

c 

m 

r 

t 

c 

g 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

TCG 

TGG 

GAA 

GAT 

TGC 

« 

ATG 

CGT 

i 

ACC 

TGC 

GGT 


SPh 1 1 



1 


178 


208 


235 


268 


295 


325 


■i 
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5' — eg gca cac 


Table 39: vgDNA to vary set# 2 BPTI 2.2 

+ 

X 
38 
mrA 


GGG 


P 

36 

CCC 


C 

37 

TGC 


a 

39 

GCG 


D 

40 

GAT 


208 


+ 

X 

41 

rvA 

Q 

42 

CAG 

+ 

X 

43 

rvk 

+ 

£ 

44 

TwT 

f 

45 

TTC 

y 

46 

TAC 

n 

47 

AAC 

a 

48 

GCT 

k 

49 

AAA 



E 

50 

GAG 

+ 

X 

51 

qfk 

L 

52 

CTG 

C 

53 

TGC 

+ 

X 

54 

afk 

+ 

2C 

55 

afk 

f 

56 

TTT 

S 

57 

TCG 

y 

58 

TAC 

g 

59 

GGT 

g 

60 

GGT 

268 

91 nt 

:s o' 

Lig#: 

JO 3 i 

r - g 

cca 

cca 



Overlap = 15 (11 C6, 4 AT) 


/- 3 ' olig#29 94 nts 


c 

r 

a 

k 

r 

n 

n 

f 

k 

61 

62 

63 

64 

65 

66 

67 

68 

69 

TGC 

CGT 

GCT 

AAG 

CGT 

AAC 

AAC 

TTT 

AAA 


aeg gca ega ttc gca ttg ttg aaa ttt 

| Esp X I 
+ ' 


s 

W 

X 

d 

c 

m 


70 

71 

72 

73 

74 

75 

w 

TCG 

TGG 

qfk 

GAT 

TGC 

ATG 

C 

age 

acc 

**m 

eta 

aeg 

ft 

tac 

geg acc tgc —5' 


Sph. 1 1 spacer 


295 


— equal parts of T and G; v = equal parts of c, A, and G; 

= equal parts of C and A; r = equal parts of A and G; 

= equal parts of A and T; 

= (.26 T, .18 C, .26 A, and .30 G) ; 

= (.22 T, .16 C, .40 A, and .22 G) ? 

= complement of symbol above 


38 41 43 44 

4 X 4 X 9 x 2 


Abundance x 10 2.5 2.5 .833 5. 

Product =2-3 x 10 -8 


51 54 55 72 

X 21 X 21 x 21 X 21 

= 6.2 X 10 7 
.663 .397 .437 .602 


Parent — 1/(4. 4 x 10 7 ) least favored = 1/(1. 25 x io 9 ) 
Least favored one-amino-acid substitution from PPBD present 
at l in 1.2 x 10 7 
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Table 40: Result: of varying set#2 of BPTI 2.2 


1 

e 

29 

30 

CTC 

GAG 

Xho X 


P 

P 

y 

t 

g 

P 

C 

E 

a 

D 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

CCG 

CCA 

PJ 

TAT 
:1M ] 

ACT 

• 

GGG 

CCC 

TGC 

GAG 

GCG 

GAT 


Apa I 


V 

Q 

N 

F 

f 

y 

n 

a 

k 



41 

42 

43 

44 

45 

46 

47 

48 

49 



GTT 

CAG 

AAT 

TTT 

TTC 

TAC 

AAC 

GCT 

AAA 



E 

F 

L 

c 

S 

A 

f 

S 

y 

g 

g 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

GAG 

TTT 

CTG 

TGC 

TCT 

GCT 

TTT 

TCG 

TAC 

GGT 

GGT 


c 

r 

a 

k 

r 

n 

n 

f 

k 

61 

62 

63 

64 

65 

66 

67 

68 

69 

TGC 

CGT 

GCT 

AAG 

CGT 

1 

AAC 

AAC 

TTT 

AAA 


Esp I 


s 

W 

Q 

d 

c 

m 

r 

t 

c 

g 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

TCG 

TGG 

CAG 

GAT 

TGC 

» 

ATG 

CGT 

• 

ACC 

TGC 

GGT 


Soh T 


178 


2 08 


235 


268 


295 


325 


g 

a 

80 

81 

GGC 

GCC 

Bbe X 

Nar I 
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Table 41: vg DNA set#2 of BPTX 2.3 



67 nts olig#34 3'- g atg ttg egg ttc 


Overlap = 13 (7 CG, 6 AT) 

+ + + + 

XFXcSXfXygg 
50 51 52 53 54 55 56 57 58 59 60 

VAG TTT I nTX | TGC I TCT I qf k I TTT qfk TAC GGT GGT| 268 

btc aaa nam aeg aga **m aaa **m atg cca cca 


c r j a I k 
61 62 63 64 

TGC CGT GCT AAG C 
aeg gca ega ttc geg ace ggc 

j Esp I | spacer ) 

k = equal parts of T and G; m = equal parts of C and A; 

w = equal parts of A and T; n = equal parts of A,C,G,T; 

d = equal parts A,G,T; v = equal parts A,C,G; 

q = (.26 T, .18 C, .26 A f and .30 G); 

f ■ (.22 T, .16 C, .40 A, and .22 G) ; 

* = complement of symbol above 

Residue 32 34 40 44 50 52 55 57 

Possibilities 6 x 6 x 21 x 6 x 3 x 5x21x21— 

3 x 10 7 

Abundance x 10 

of PPBD 10/6 10/6 .545 10/6 10/3 30/8 .459 .701 

product =1.01 x 10 -7 

parent = 1/(1 x 10 7 ) least favored = 1/(4 x 10 8 ) 

Least favored one-amino-acid substitution from PPBD present 
at 1 in 3 x 10 7 
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Table 42: Result: of varying set#2 of BPTI 2. 


1 

e 

29 

30 

CTC 

GAG 

Ava I 

Xho I 


p 

E 

y 

Q 

g 

P 

c 

E 

a 

A 


31 

32 

33 

34 

35 

36 

37 

38 

39 

40 


CCG 

GAG 

TAT 

CAG 

GGG 

CCC 

TGC 

GAG 

GCG 

GCT 






Aoa 1 






V 

Q 

N 

W 

f 

y 

n 

a 

k 



41 

42 

43 

44 

45 

46 

47 

48 

49 



GTT 

CAG 

AAT 

TGG 

TTC 

TAC 

AAC 

GCT 

AAA 



Q 

F 

M 

C 

S 

L 

f 

H 

y 

g 

g 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

CAG 

TTT 

ATG 

TGC 

TCT 

CTT 

TTT 

CAT 

TAC 

GGT 

GGT 

c 

r 

a 

k 

r 

n 

n 

f 

k 



61 

62 

63 

64 

65 

66 

67 

68 

69 



TGC 

CGT 

GCT 

AAG 

CGT. 

AAC 

AAC 

TTT 

AAA 




Eso 1 ( 


s 

W 

Q 

d 

c 

m 

r 

t 

C 

g 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

TCG 

TGG 

CAG 

GAT 

TGC 
1 £ 

ATG 
3oh : 

CGT 

LL 

ACC 

TGC 

GGT 


g 

a 

80 

81 

GGC 

GCC 

Bbe 1 

Nar I 


3 


178 


208 


235 


268 


295 


325 
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X. A method of obtaining a protein that binds a 
predetermined target that comprises: 

5 

a) preparing a variegated population of replicable 
genetic packages , each package including a nucleic 
acid construct coding on expression for an outer- 
surface “displayed potential binding protein other 
10 than a single chain antibody comprising (i) a 

structural signal directing the display of the 
protein on the outer surface of the package and 
(ii) a potential binding domain for binding said 
target, where a plurality of different potential 
15 binding domains are displayed by said population. 


b) causing the expression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

c) contacting the packages with target material so 
that the potential binding domains of the proteins 
and the target material may interact, and 
separating packages bearing a binding domain that 
binds target material from packages that do not so 
bind , and 


d) recovering and replicating at least one package 
bearing a successful binding domain, 

preferably further comprising (e) determining the 
amino acid sequence of a successful binding 
domain , 
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and more preferably, further comprising (f) 
preparing a new variegated population of 
replicable genetic packages according to step (a) , 
the parental potential binding domain for the 
potential binding domains of said new packages 
being a successful binding domain whose sequence 
was determined in step (e) , and repeating steps 
(b) -(e) with said new population. 


The method of claim 1 wherein the population of 
replicable genetic packages of step (a) is 
obtained by: 

i) preparing a variegated population of DNA 
inserts of each of which comprises a first 
sequence which codes on expression for a potential 
binding domain and, a second sequence encoding 
signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
replicable genetic package, and 

ii) incorporating the resulting population of DNA 
constructs into the chosen replicable genetic 
packages to produce a population of replicable 
genetic packages, 

wherein preferably (1) said population is 
characterized by the display of at least 10 5 but 
not more than 10 9 different potential binding 
domains and/or (2) from 1 in 10 4 to l in 10 9 of 
the packages of said population display the same 
potential binding domain. 

The method of claim 1 wherein, in step (a) , the 
potential binding domains encoded by the nucleic 
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acid constructs are each, related in sequence to a 
parental potential binding domain by a limited 
number of amino acid substitutions in the amino 
acid sequence of said parental potential binding 
5 domain, and, preferably the level of variegation 

of the population is chosen such that the packages 
displaying potential binding domains obtained by 
single amino acid substitutions in the amino acid 
sequence of the parental potential binding domain 
10 are present in detectable amounts, and preferably 

the initially chosen parental potential binding 
protein has at least one stable binding domain and 
said domain has a melting point of at least 60°C 
and is stable over a pH range of at least 3 . 0—8 . 0 - 
15 

4 . The method of claim 1 wherein the displayable 
potential binding protein is a chimeric protein , 
and preferably, wherein said signal is provided 
by a segment of said chimeric protein which is 

20 essentially identical in amino acid sequence with 

at least a functional portion of a natural outer 
surface protein encoded by said genetic package or 
a cell naturally infected by said genetic package, 
said portion directing the transport of said 
25 chimeric protein to the outer surface of the 

genetic package. 

5. 

30 


The method of claim 3 wherein the parental 
potential binding domain is initially chosen to be 
one which is over 50% homologous with a domain of 
a known protein, the latter domain having a 
melting point of at least about 60°C. 
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6. The method of claim 5 wherein the initially chosen 
parental binding protein does not preferentially 
bind the predetermined target. 

5 7. The method of claim 3, said target material 

comprising one or more discrete molecules, said 
parental potential binding domain being 
characterized as a sequence of amino acids, 
further comprising identifying an interaction set 
10 of amino acids which are on the surface of the 

parental potential binding domain and which can 
all simultaneously touch a single molecule of the 
target material, and obtaining potential binding 
domains by substituting a different amino acid for 
15 • one or more of the amino acids in said interaction 

set. 

The method of 'claim 1 wherein the target material 
is a non-macromolecular organic compound and the 
potential binding domains comprise greater than 
about 80 amino acid residues. 

9. The method of claim 1 wherein the target material 
is a non-macromolecular organic compound and the 
potential binding domains comprise greater than 
about 80 amino residues. 

* 

10. The method of claim l wherein the target material 
is a mineral insoluble in aqueous solution. 

11. The method of claim 1 wherein the target is an 
inorganic molecule or complex ion that is stable 
in aqueous solution. 

35 12. The method of claim 1 wherein the target is an 
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organometallic compound that: is stable in aqueous 
solution. 

The method of claim 1 wherein the target material 
is a general protease, wherein the immobilized 
target material is first incubated with an 
irreversible or covalent inhibitor to inactivate 


The method of claim 1 wherein the replicable 
genetic package is a cell or virus that can be 
affinity separated and retain viability. 

The method of claim 5 wherein the known binding 
protein is an enzyme, the activity of which has a 
deleterious effect on the replicable genetic 
package, the host of the replicable genetic 
package, or the target, wherein the majority of 
the nucleic acid constructs code on expression or 
an analogue of the known binding protein that does 
not have such deleterious enzymatic activity. 

The method of claim l wherein the target contains 
ionizable groups and the pH of the solutions of 
the intended use and the pH of the affinity 
separations are chosen so that both the potential 
binding protein and the target remain stable. 

The method of claim 1 wherein the target contains 
ionizable groups, further comprising providing 
counter ions to reduce electrostatic repulsion 
between the potential binding protein and the 
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18. The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 
affinity separation, that the potential binding 
domains and the target will either have opposite 
charge or one of them will be neutral. 

19. The method of claim 28 wherein the replicable 
genetic package is a bacterial cell, such as 

a strain of Escherichia coli . 

20. The method of claim 1 wherein the replicable 

genetic package is a bacterial spore such as 

a Bacillus endospore, more preferably an endospore 
of a strain of B^. subtil is . 

21. The method of claim 1 wherein the replicable 

genetic package is a bacteriophage , such as a 
filamentous phage, preferably a derivative of an 
M13 Escherichia coli bacteriophage or derivative 
of the Pseudomonas aeruginosa filamentous phage 
Pfl. 

22. The method of claim 21 wherein the signal is 

provided by the coat protein of M13 or a segment 
thereof embodying an outer surface transport 
signal . 

23. The method of claim 21 wherein the signal is 

provided by the gene III protein of M13 or a 
segment thereof embodying an outer surface 
transport signal. 


35 



WO 90/02809 
24. 

5 

10 

25. 
15 

26. 

20 

27. 

25 


PCT/US89/03731 


254 

The method of claim 2 wherein the distribution of 
nucleotides incorporated at each variegated codon 
is chosen to yield substantially equal abundances 
of acidic and basic amino acids , and, preferably 
the distribution of nucleotides incorporated at 
each variegated codon is further chosen to yield 
the largest value for the quantity {(1.— 
abundance (stop codons) ) times (abundance of the 
least abundant amino acid) /( abundance of the most 
abundant amino acid) } . 

The method of claim 1, wherein step (c) further 
comprises contacting the packages with a second 
material and isolating packages which do not bind 
that second material. 

The method of claim 1, wherein after obtaining a 
novel binding protein recognizing a first 
predetermined target, the novel binding protein is 
chosen as a parental potential binding protein for 
the isolation of a derivative protein which also 
binds to a second predetermined target. 

The method of claim 3 wherein the initially chosen 
parental potential binding domain is selected from 
the group consisting of (a) binding domains of 
bovine pancreatic trypsin inhibitor, crambm, 
ovomucoid, T4 lysozyme, hen egg white lysozyme, 
ribonuclease, and azurin, and (b) domains at least 
50% homologous with any of the foregoing domains 
and which have a melting point of at least 60°C. 
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15 


28. The method of claim 36 wherein the outer surface 
transport signal is provided by the lamB protein 
or a segment thereof embodying an outer surface 
transport signal. 

29. The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cote or cotD protein or a segment thereof 
embodying an outer surface transport signal. 

30. A chimeric protein comprising (i) at least a 
segment of an outer surface protein of a cell or 
virus, said segment providing an outer surface 
transport signal recognized by said cell or virus, 

4 

and (ii) a domain foreign to said outer surface 
protein, and, preferably, said foreign domain 
binds to a target material not preferentially 
bound by said outer surface protein. 


20 31. A replicable genetic package which contains a 

s 

nucleic acid construct which codes on expression 
for the chimeric protein of claim 30. 

32. The method of claim 1 wherein in at least one 
25 instance the amino acid residues varied in a first 

assortment of potential binding domains are left 
constant in the next assortment of potential 
binding domains. 


30 33. A method of preparing a population of variegated 

ONA wherein the distribution of nucleotides 
incorporated at each variegated codon is chosen to 
yield substantially equal abundances of acidic and 
basic amino acids, and, preferably, the 
distribution of nucleotides incorporated at each 
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variegated codon is further chosen to yield the 
largest value for the quantity { (1.- abundance (stop 
codons) ) times ( abundance of the least abundant 
amino acid) / (abundance of the most abundant amino 
5 acid) } . 

34. The protein of claim 66, wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second foreign domain 

10 recognizing a second target material. 

35. The method of claim 3 wherein the initially chosen 
parental potential binding domain is at least 50% 
homologous with the binding domain of bovine 

15 pancreatic trypsin inhibitor. 
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